[skip ci] Documentation updates

mindee · Feb 28, 2024 · af81e06 · af81e06
1 parent 306fe1b
commit af81e06
Show file tree

Hide file tree

Showing 33 changed files with 412 additions and 12 deletions.
diff --git a/.doctrees/environment.pickle b/.doctrees/environment.pickle
diff --git a/latest/_sources/using_doctr/using_models.rst.txt b/latest/_sources/using_doctr/using_models.rst.txt
@@ -279,6 +279,19 @@ For instance, this snippet instantiates an end-to-end ocr_predictor working with
     from doctr.model import ocr_predictor
     model = ocr_predictor('linknet_resnet18', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)
 
+To modify the output structure you can pass the following arguments to the predictor which will be handled by the underlying `DocumentBuilder`:
+
+* `resolve_lines`: whether words should be automatically grouped into lines (default: True)
+* `resolve_blocks`: whether lines should be automatically grouped into blocks (default: True)
+* `paragraph_break`: relative length of the minimum space separating paragraphs (default: 0.035)
+
+For example to disable the automatic grouping of lines into blocks:
+
+.. code:: python3
+
+    from doctr.model import ocr_predictor
+    model = ocr_predictor(pretrained=True, resolve_blocks=False)
+
 
 What should I do with the output?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -304,6 +317,14 @@ Here is a typical `Document` layout::
     )]
   )
 
+To get only the text content of the `Document`, you can use the `render` method::
+
+  text_output = result.render()
+
+For reference, here is the output for the `Document` above::
+
+  No. RECEIPT DATE
+
 You can also export them as a nested dict, more appropriate for JSON format::
 
   json_output = result.export()

diff --git a/latest/searchindex.js b/latest/searchindex.js
diff --git a/latest/using_doctr/using_models.html b/latest/using_doctr/using_models.html
@@ -836,6 +836,17 @@ <h3>Two-stage approaches<a class="headerlink" href="#two-stage-approaches" title
 <span class="n">model</span> <span class="o">=</span> <span class="n">ocr_predictor</span><span class="p">(</span><span class="s1">&#39;linknet_resnet18&#39;</span><span class="p">,</span> <span class="n">pretrained</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">assume_straight_pages</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">preserve_aspect_ratio</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
 </pre></div>
 </div>
+<p>To modify the output structure you can pass the following arguments to the predictor which will be handled by the underlying <cite>DocumentBuilder</cite>:</p>
+<ul class="simple">
+<li><p><cite>resolve_lines</cite>: whether words should be automatically grouped into lines (default: True)</p></li>
+<li><p><cite>resolve_blocks</cite>: whether lines should be automatically grouped into blocks (default: True)</p></li>
+<li><p><cite>paragraph_break</cite>: relative length of the minimum space separating paragraphs (default: 0.035)</p></li>
+</ul>
+<p>For example to disable the automatic grouping of lines into blocks:</p>
+<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">doctr.model</span> <span class="kn">import</span> <span class="n">ocr_predictor</span>
+<span class="n">model</span> <span class="o">=</span> <span class="n">ocr_predictor</span><span class="p">(</span><span class="n">pretrained</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">resolve_blocks</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
+</pre></div>
+</div>
 </section>
 <section id="what-should-i-do-with-the-output">
 <h3>What should I do with the output?<a class="headerlink" href="#what-should-i-do-with-the-output" title="Permalink to this heading">#</a></h3>
@@ -859,6 +870,14 @@ <h3>What should I do with the output?<a class="headerlink" href="#what-should-i-
 <span class="p">)</span>
 </pre></div>
 </div>
+<p>To get only the text content of the <cite>Document</cite>, you can use the <cite>render</cite> method:</p>
+<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="n">text_output</span> <span class="o">=</span> <span class="n">result</span><span class="o">.</span><span class="n">render</span><span class="p">()</span>
+</pre></div>
+</div>
+<p>For reference, here is the output for the <cite>Document</cite> above:</p>
+<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="n">No</span><span class="o">.</span> <span class="n">RECEIPT</span> <span class="n">DATE</span>
+</pre></div>
+</div>
 <p>You can also export them as a nested dict, more appropriate for JSON format:</p>
 <div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="n">json_output</span> <span class="o">=</span> <span class="n">result</span><span class="o">.</span><span class="n">export</span><span class="p">()</span>
 </pre></div>

diff --git a/v0.1.0/_sources/using_doctr/using_models.rst.txt b/v0.1.0/_sources/using_doctr/using_models.rst.txt
@@ -279,6 +279,19 @@ For instance, this snippet instantiates an end-to-end ocr_predictor working with
     from doctr.model import ocr_predictor
     model = ocr_predictor('linknet_resnet18', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)
 
+To modify the output structure you can pass the following arguments to the predictor which will be handled by the underlying `DocumentBuilder`:
+
+* `resolve_lines`: whether words should be automatically grouped into lines (default: True)
+* `resolve_blocks`: whether lines should be automatically grouped into blocks (default: True)
+* `paragraph_break`: relative length of the minimum space separating paragraphs (default: 0.035)
+
+For example to disable the automatic grouping of lines into blocks:
+
+.. code:: python3
+
+    from doctr.model import ocr_predictor
+    model = ocr_predictor(pretrained=True, resolve_blocks=False)
+
 
 What should I do with the output?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -304,6 +317,14 @@ Here is a typical `Document` layout::
     )]
   )
 
+To get only the text content of the `Document`, you can use the `render` method::
+
+  text_output = result.render()
+
+For reference, here is the output for the `Document` above::
+
+  No. RECEIPT DATE
+
 You can also export them as a nested dict, more appropriate for JSON format::
 
   json_output = result.export()

diff --git a/v0.1.0/searchindex.js b/v0.1.0/searchindex.js
diff --git a/v0.1.0/using_doctr/using_models.html b/v0.1.0/using_doctr/using_models.html
@@ -836,6 +836,17 @@ <h3>Two-stage approaches<a class="headerlink" href="#two-stage-approaches" title
 <span class="n">model</span> <span class="o">=</span> <span class="n">ocr_predictor</span><span class="p">(</span><span class="s1">&#39;linknet_resnet18&#39;</span><span class="p">,</span> <span class="n">pretrained</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">assume_straight_pages</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">preserve_aspect_ratio</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
 </pre></div>
 </div>
+<p>To modify the output structure you can pass the following arguments to the predictor which will be handled by the underlying <cite>DocumentBuilder</cite>:</p>
+<ul class="simple">
+<li><p><cite>resolve_lines</cite>: whether words should be automatically grouped into lines (default: True)</p></li>
+<li><p><cite>resolve_blocks</cite>: whether lines should be automatically grouped into blocks (default: True)</p></li>
+<li><p><cite>paragraph_break</cite>: relative length of the minimum space separating paragraphs (default: 0.035)</p></li>
+</ul>
+<p>For example to disable the automatic grouping of lines into blocks:</p>
+<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">doctr.model</span> <span class="kn">import</span> <span class="n">ocr_predictor</span>
+<span class="n">model</span> <span class="o">=</span> <span class="n">ocr_predictor</span><span class="p">(</span><span class="n">pretrained</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">resolve_blocks</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
+</pre></div>
+</div>
 </section>
 <section id="what-should-i-do-with-the-output">
 <h3>What should I do with the output?<a class="headerlink" href="#what-should-i-do-with-the-output" title="Permalink to this heading">#</a></h3>
@@ -859,6 +870,14 @@ <h3>What should I do with the output?<a class="headerlink" href="#what-should-i-
 <span class="p">)</span>
 </pre></div>
 </div>
+<p>To get only the text content of the <cite>Document</cite>, you can use the <cite>render</cite> method:</p>
+<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="n">text_output</span> <span class="o">=</span> <span class="n">result</span><span class="o">.</span><span class="n">render</span><span class="p">()</span>
+</pre></div>
+</div>
+<p>For reference, here is the output for the <cite>Document</cite> above:</p>
+<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="n">No</span><span class="o">.</span> <span class="n">RECEIPT</span> <span class="n">DATE</span>
+</pre></div>
+</div>
 <p>You can also export them as a nested dict, more appropriate for JSON format:</p>
 <div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="n">json_output</span> <span class="o">=</span> <span class="n">result</span><span class="o">.</span><span class="n">export</span><span class="p">()</span>
 </pre></div>

diff --git a/v0.1.1/_sources/using_doctr/using_models.rst.txt b/v0.1.1/_sources/using_doctr/using_models.rst.txt
@@ -279,6 +279,19 @@ For instance, this snippet instantiates an end-to-end ocr_predictor working with
     from doctr.model import ocr_predictor
     model = ocr_predictor('linknet_resnet18', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)
 
+To modify the output structure you can pass the following arguments to the predictor which will be handled by the underlying `DocumentBuilder`:
+
+* `resolve_lines`: whether words should be automatically grouped into lines (default: True)
+* `resolve_blocks`: whether lines should be automatically grouped into blocks (default: True)
+* `paragraph_break`: relative length of the minimum space separating paragraphs (default: 0.035)
+
+For example to disable the automatic grouping of lines into blocks:
+
+.. code:: python3
+
+    from doctr.model import ocr_predictor
+    model = ocr_predictor(pretrained=True, resolve_blocks=False)
+
 
 What should I do with the output?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -304,6 +317,14 @@ Here is a typical `Document` layout::
     )]
   )
 
+To get only the text content of the `Document`, you can use the `render` method::
+
+  text_output = result.render()
+
+For reference, here is the output for the `Document` above::
+
+  No. RECEIPT DATE
+
 You can also export them as a nested dict, more appropriate for JSON format::
 
   json_output = result.export()

diff --git a/v0.1.1/searchindex.js b/v0.1.1/searchindex.js
diff --git a/v0.1.1/using_doctr/using_models.html b/v0.1.1/using_doctr/using_models.html
@@ -836,6 +836,17 @@ <h3>Two-stage approaches<a class="headerlink" href="#two-stage-approaches" title
 <span class="n">model</span> <span class="o">=</span> <span class="n">ocr_predictor</span><span class="p">(</span><span class="s1">&#39;linknet_resnet18&#39;</span><span class="p">,</span> <span class="n">pretrained</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">assume_straight_pages</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">preserve_aspect_ratio</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
 </pre></div>
 </div>
+<p>To modify the output structure you can pass the following arguments to the predictor which will be handled by the underlying <cite>DocumentBuilder</cite>:</p>
+<ul class="simple">
+<li><p><cite>resolve_lines</cite>: whether words should be automatically grouped into lines (default: True)</p></li>
+<li><p><cite>resolve_blocks</cite>: whether lines should be automatically grouped into blocks (default: True)</p></li>
+<li><p><cite>paragraph_break</cite>: relative length of the minimum space separating paragraphs (default: 0.035)</p></li>
+</ul>
+<p>For example to disable the automatic grouping of lines into blocks:</p>
+<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">doctr.model</span> <span class="kn">import</span> <span class="n">ocr_predictor</span>
+<span class="n">model</span> <span class="o">=</span> <span class="n">ocr_predictor</span><span class="p">(</span><span class="n">pretrained</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">resolve_blocks</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
+</pre></div>
+</div>
 </section>
 <section id="what-should-i-do-with-the-output">
 <h3>What should I do with the output?<a class="headerlink" href="#what-should-i-do-with-the-output" title="Permalink to this heading">#</a></h3>
@@ -859,6 +870,14 @@ <h3>What should I do with the output?<a class="headerlink" href="#what-should-i-
 <span class="p">)</span>
 </pre></div>
 </div>
+<p>To get only the text content of the <cite>Document</cite>, you can use the <cite>render</cite> method:</p>
+<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="n">text_output</span> <span class="o">=</span> <span class="n">result</span><span class="o">.</span><span class="n">render</span><span class="p">()</span>
+</pre></div>
+</div>
+<p>For reference, here is the output for the <cite>Document</cite> above:</p>
+<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="n">No</span><span class="o">.</span> <span class="n">RECEIPT</span> <span class="n">DATE</span>
+</pre></div>
+</div>
 <p>You can also export them as a nested dict, more appropriate for JSON format:</p>
 <div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="n">json_output</span> <span class="o">=</span> <span class="n">result</span><span class="o">.</span><span class="n">export</span><span class="p">()</span>
 </pre></div>

diff --git a/v0.2.0/_sources/using_doctr/using_models.rst.txt b/v0.2.0/_sources/using_doctr/using_models.rst.txt
@@ -279,6 +279,19 @@ For instance, this snippet instantiates an end-to-end ocr_predictor working with
     from doctr.model import ocr_predictor
     model = ocr_predictor('linknet_resnet18', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)
 
+To modify the output structure you can pass the following arguments to the predictor which will be handled by the underlying `DocumentBuilder`:
+
+* `resolve_lines`: whether words should be automatically grouped into lines (default: True)
+* `resolve_blocks`: whether lines should be automatically grouped into blocks (default: True)
+* `paragraph_break`: relative length of the minimum space separating paragraphs (default: 0.035)
+
+For example to disable the automatic grouping of lines into blocks:
+
+.. code:: python3
+
+    from doctr.model import ocr_predictor
+    model = ocr_predictor(pretrained=True, resolve_blocks=False)
+
 
 What should I do with the output?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -304,6 +317,14 @@ Here is a typical `Document` layout::
     )]
   )
 
+To get only the text content of the `Document`, you can use the `render` method::
+
+  text_output = result.render()
+
+For reference, here is the output for the `Document` above::
+
+  No. RECEIPT DATE
+
 You can also export them as a nested dict, more appropriate for JSON format::
 
   json_output = result.export()

diff --git a/v0.2.0/searchindex.js b/v0.2.0/searchindex.js
diff --git a/v0.2.0/using_doctr/using_models.html b/v0.2.0/using_doctr/using_models.html
@@ -836,6 +836,17 @@ <h3>Two-stage approaches<a class="headerlink" href="#two-stage-approaches" title
 <span class="n">model</span> <span class="o">=</span> <span class="n">ocr_predictor</span><span class="p">(</span><span class="s1">&#39;linknet_resnet18&#39;</span><span class="p">,</span> <span class="n">pretrained</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">assume_straight_pages</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">preserve_aspect_ratio</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
 </pre></div>
 </div>
+<p>To modify the output structure you can pass the following arguments to the predictor which will be handled by the underlying <cite>DocumentBuilder</cite>:</p>
+<ul class="simple">
+<li><p><cite>resolve_lines</cite>: whether words should be automatically grouped into lines (default: True)</p></li>
+<li><p><cite>resolve_blocks</cite>: whether lines should be automatically grouped into blocks (default: True)</p></li>
+<li><p><cite>paragraph_break</cite>: relative length of the minimum space separating paragraphs (default: 0.035)</p></li>
+</ul>
+<p>For example to disable the automatic grouping of lines into blocks:</p>
+<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">doctr.model</span> <span class="kn">import</span> <span class="n">ocr_predictor</span>
+<span class="n">model</span> <span class="o">=</span> <span class="n">ocr_predictor</span><span class="p">(</span><span class="n">pretrained</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">resolve_blocks</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
+</pre></div>
+</div>
 </section>
 <section id="what-should-i-do-with-the-output">
 <h3>What should I do with the output?<a class="headerlink" href="#what-should-i-do-with-the-output" title="Permalink to this heading">#</a></h3>
@@ -859,6 +870,14 @@ <h3>What should I do with the output?<a class="headerlink" href="#what-should-i-
 <span class="p">)</span>
 </pre></div>
 </div>
+<p>To get only the text content of the <cite>Document</cite>, you can use the <cite>render</cite> method:</p>
+<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="n">text_output</span> <span class="o">=</span> <span class="n">result</span><span class="o">.</span><span class="n">render</span><span class="p">()</span>
+</pre></div>
+</div>
+<p>For reference, here is the output for the <cite>Document</cite> above:</p>
+<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="n">No</span><span class="o">.</span> <span class="n">RECEIPT</span> <span class="n">DATE</span>
+</pre></div>
+</div>
 <p>You can also export them as a nested dict, more appropriate for JSON format:</p>
 <div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="n">json_output</span> <span class="o">=</span> <span class="n">result</span><span class="o">.</span><span class="n">export</span><span class="p">()</span>
 </pre></div>

diff --git a/v0.2.1/_sources/using_doctr/using_models.rst.txt b/v0.2.1/_sources/using_doctr/using_models.rst.txt
@@ -279,6 +279,19 @@ For instance, this snippet instantiates an end-to-end ocr_predictor working with
     from doctr.model import ocr_predictor
     model = ocr_predictor('linknet_resnet18', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)
 
+To modify the output structure you can pass the following arguments to the predictor which will be handled by the underlying `DocumentBuilder`:
+
+* `resolve_lines`: whether words should be automatically grouped into lines (default: True)
+* `resolve_blocks`: whether lines should be automatically grouped into blocks (default: True)
+* `paragraph_break`: relative length of the minimum space separating paragraphs (default: 0.035)
+
+For example to disable the automatic grouping of lines into blocks:
+
+.. code:: python3
+
+    from doctr.model import ocr_predictor
+    model = ocr_predictor(pretrained=True, resolve_blocks=False)
+
 
 What should I do with the output?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -304,6 +317,14 @@ Here is a typical `Document` layout::
     )]
   )
 
+To get only the text content of the `Document`, you can use the `render` method::
+
+  text_output = result.render()
+
+For reference, here is the output for the `Document` above::
+
+  No. RECEIPT DATE
+
 You can also export them as a nested dict, more appropriate for JSON format::
 
   json_output = result.export()

diff --git a/v0.2.1/searchindex.js b/v0.2.1/searchindex.js