Skip to content

Commit

Permalink
[skip ci] Documentation updates
Browse files Browse the repository at this point in the history
  • Loading branch information
felixdittrich92 committed Feb 8, 2024
1 parent 733248b commit bcf6afc
Show file tree
Hide file tree
Showing 44 changed files with 982 additions and 22 deletions.
Binary file modified .doctrees/environment.pickle
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -360,6 +360,8 @@ <h1>Source code for doctr.models.detection.differentiable_binarization.tensorflo
<span class="sd"> ----</span>
<span class="sd"> feature extractor: the backbone serving as feature extractor</span>
<span class="sd"> fpn_channels: number of channels each extracted feature maps is mapped to</span>
<span class="sd"> bin_thresh: threshold for binarization</span>
<span class="sd"> box_thresh: minimal objectness score to consider a box</span>
<span class="sd"> assume_straight_pages: if True, fit straight bounding boxes only</span>
<span class="sd"> exportable: onnx exportable returns only logits</span>
<span class="sd"> cfg: the configuration dict of the model</span>
Expand All @@ -373,6 +375,7 @@ <h1>Source code for doctr.models.detection.differentiable_binarization.tensorflo
<span class="n">feature_extractor</span><span class="p">:</span> <span class="n">IntermediateLayerGetter</span><span class="p">,</span>
<span class="n">fpn_channels</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">128</span><span class="p">,</span> <span class="c1"># to be set to 256 to represent the author&#39;s initial idea</span>
<span class="n">bin_thresh</span><span class="p">:</span> <span class="nb">float</span> <span class="o">=</span> <span class="mf">0.3</span><span class="p">,</span>
<span class="n">box_thresh</span><span class="p">:</span> <span class="nb">float</span> <span class="o">=</span> <span class="mf">0.1</span><span class="p">,</span>
<span class="n">assume_straight_pages</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">True</span><span class="p">,</span>
<span class="n">exportable</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span><span class="p">,</span>
<span class="n">cfg</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Any</span><span class="p">]]</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span>
Expand Down Expand Up @@ -407,7 +410,9 @@ <h1>Source code for doctr.models.detection.differentiable_binarization.tensorflo
<span class="n">layers</span><span class="o">.</span><span class="n">Conv2DTranspose</span><span class="p">(</span><span class="n">num_classes</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">strides</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">kernel_initializer</span><span class="o">=</span><span class="s2">&quot;he_normal&quot;</span><span class="p">),</span>
<span class="p">])</span>

<span class="bp">self</span><span class="o">.</span><span class="n">postprocessor</span> <span class="o">=</span> <span class="n">DBPostProcessor</span><span class="p">(</span><span class="n">assume_straight_pages</span><span class="o">=</span><span class="n">assume_straight_pages</span><span class="p">,</span> <span class="n">bin_thresh</span><span class="o">=</span><span class="n">bin_thresh</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">postprocessor</span> <span class="o">=</span> <span class="n">DBPostProcessor</span><span class="p">(</span>
<span class="n">assume_straight_pages</span><span class="o">=</span><span class="n">assume_straight_pages</span><span class="p">,</span> <span class="n">bin_thresh</span><span class="o">=</span><span class="n">bin_thresh</span><span class="p">,</span> <span class="n">box_thresh</span><span class="o">=</span><span class="n">box_thresh</span>
<span class="p">)</span>

<span class="k">def</span> <span class="nf">compute_loss</span><span class="p">(</span>
<span class="bp">self</span><span class="p">,</span>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -346,6 +346,8 @@ <h1>Source code for doctr.models.detection.linknet.tensorflow</h1><div class="hi
<span class="sd"> ----</span>
<span class="sd"> feature extractor: the backbone serving as feature extractor</span>
<span class="sd"> fpn_channels: number of channels each extracted feature maps is mapped to</span>
<span class="sd"> bin_thresh: threshold for binarization of the output feature map</span>
<span class="sd"> box_thresh: minimal objectness score to consider a box</span>
<span class="sd"> assume_straight_pages: if True, fit straight bounding boxes only</span>
<span class="sd"> exportable: onnx exportable returns only logits</span>
<span class="sd"> cfg: the configuration dict of the model</span>
Expand All @@ -359,6 +361,7 @@ <h1>Source code for doctr.models.detection.linknet.tensorflow</h1><div class="hi
<span class="n">feat_extractor</span><span class="p">:</span> <span class="n">IntermediateLayerGetter</span><span class="p">,</span>
<span class="n">fpn_channels</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">64</span><span class="p">,</span>
<span class="n">bin_thresh</span><span class="p">:</span> <span class="nb">float</span> <span class="o">=</span> <span class="mf">0.1</span><span class="p">,</span>
<span class="n">box_thresh</span><span class="p">:</span> <span class="nb">float</span> <span class="o">=</span> <span class="mf">0.1</span><span class="p">,</span>
<span class="n">assume_straight_pages</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">True</span><span class="p">,</span>
<span class="n">exportable</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span><span class="p">,</span>
<span class="n">cfg</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Any</span><span class="p">]]</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span>
Expand Down Expand Up @@ -400,7 +403,9 @@ <h1>Source code for doctr.models.detection.linknet.tensorflow</h1><div class="hi
<span class="p">),</span>
<span class="p">])</span>

<span class="bp">self</span><span class="o">.</span><span class="n">postprocessor</span> <span class="o">=</span> <span class="n">LinkNetPostProcessor</span><span class="p">(</span><span class="n">assume_straight_pages</span><span class="o">=</span><span class="n">assume_straight_pages</span><span class="p">,</span> <span class="n">bin_thresh</span><span class="o">=</span><span class="n">bin_thresh</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">postprocessor</span> <span class="o">=</span> <span class="n">LinkNetPostProcessor</span><span class="p">(</span>
<span class="n">assume_straight_pages</span><span class="o">=</span><span class="n">assume_straight_pages</span><span class="p">,</span> <span class="n">bin_thresh</span><span class="o">=</span><span class="n">bin_thresh</span><span class="p">,</span> <span class="n">box_thresh</span><span class="o">=</span><span class="n">box_thresh</span>
<span class="p">)</span>

<span class="k">def</span> <span class="nf">compute_loss</span><span class="p">(</span>
<span class="bp">self</span><span class="p">,</span>
Expand Down
47 changes: 47 additions & 0 deletions latest/_sources/using_doctr/using_models.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -398,3 +398,50 @@ For reference, here is a sample XML byte string output:
</div>
</body>
</html>
Advanced options
^^^^^^^^^^^^^^^^
We provide a few advanced options to customize the behavior of the predictor to your needs:

* Modify the binarization threshold for the detection model.
* Modify the box threshold for the detection model.

This is useful to detect (possible less) text regions more accurately with a higher threshold, or to detect more text regions with a lower threshold.


.. code:: python3
import numpy as np
from doctr.models import ocr_predictor
predictor = ocr_predictor('db_resnet50', 'crnn_vgg16_bn', pretrained=True)
# Modify the binarization threshold and the box threshold
predictor.det_predictor.model.postprocessor.bin_thresh = 0.5
predictor.det_predictor.model.postprocessor.box_thresh = 0.2
input_page = (255 * np.random.rand(800, 600, 3)).astype(np.uint8)
out = predictor([input_page])
* Add a hook to the `ocr_predictor` to manipulate the location predictions before the crops are passed to the recognition model.

.. code:: python3
from doctr.model import ocr_predictor
class CustomHook:
def __call__(self, loc_preds):
# Manipulate the location predictions here
# 1. The outpout structure needs to be the same as the input location predictions
# 2. Be aware that the coordinates are relative and needs to be between 0 and 1
return loc_preds
my_hook = CustomHook()
predictor = ocr_predictor(pretrained=True)
# Add a hook in the middle of the pipeline
predictor.add_hook(my_hook)
# You can also add multiple hooks which will be executed sequentially
for hook in [my_hook, my_hook, my_hook]:
predictor.add_hook(hook)
Binary file modified latest/objects.inv
Binary file not shown.
2 changes: 1 addition & 1 deletion latest/searchindex.js

Large diffs are not rendered by default.

44 changes: 44 additions & 0 deletions latest/using_doctr/using_models.html
Original file line number Diff line number Diff line change
Expand Up @@ -976,6 +976,49 @@ <h3>What should I do with the output?<a class="headerlink" href="#what-should-i-
</pre></div>
</div>
</section>
<section id="advanced-options">
<h3>Advanced options<a class="headerlink" href="#advanced-options" title="Permalink to this heading">#</a></h3>
<p>We provide a few advanced options to customize the behavior of the predictor to your needs:</p>
<ul class="simple">
<li><p>Modify the binarization threshold for the detection model.</p></li>
<li><p>Modify the box threshold for the detection model.</p></li>
</ul>
<p>This is useful to detect (possible less) text regions more accurately with a higher threshold, or to detect more text regions with a lower threshold.</p>
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="kn">from</span> <span class="nn">doctr.models</span> <span class="kn">import</span> <span class="n">ocr_predictor</span>
<span class="n">predictor</span> <span class="o">=</span> <span class="n">ocr_predictor</span><span class="p">(</span><span class="s1">&#39;db_resnet50&#39;</span><span class="p">,</span> <span class="s1">&#39;crnn_vgg16_bn&#39;</span><span class="p">,</span> <span class="n">pretrained</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>

<span class="c1"># Modify the binarization threshold and the box threshold</span>
<span class="n">predictor</span><span class="o">.</span><span class="n">det_predictor</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="n">postprocessor</span><span class="o">.</span><span class="n">bin_thresh</span> <span class="o">=</span> <span class="mf">0.5</span>
<span class="n">predictor</span><span class="o">.</span><span class="n">det_predictor</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="n">postprocessor</span><span class="o">.</span><span class="n">box_thresh</span> <span class="o">=</span> <span class="mf">0.2</span>

<span class="n">input_page</span> <span class="o">=</span> <span class="p">(</span><span class="mi">255</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">800</span><span class="p">,</span> <span class="mi">600</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">uint8</span><span class="p">)</span>
<span class="n">out</span> <span class="o">=</span> <span class="n">predictor</span><span class="p">([</span><span class="n">input_page</span><span class="p">])</span>
</pre></div>
</div>
<ul class="simple">
<li><p>Add a hook to the <cite>ocr_predictor</cite> to manipulate the location predictions before the crops are passed to the recognition model.</p></li>
</ul>
<div class="highlight-python3 notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">doctr.model</span> <span class="kn">import</span> <span class="n">ocr_predictor</span>

<span class="k">class</span> <span class="nc">CustomHook</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__call__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">loc_preds</span><span class="p">):</span>
<span class="c1"># Manipulate the location predictions here</span>
<span class="c1"># 1. The outpout structure needs to be the same as the input location predictions</span>
<span class="c1"># 2. Be aware that the coordinates are relative and needs to be between 0 and 1</span>
<span class="k">return</span> <span class="n">loc_preds</span>

<span class="n">my_hook</span> <span class="o">=</span> <span class="n">CustomHook</span><span class="p">()</span>

<span class="n">predictor</span> <span class="o">=</span> <span class="n">ocr_predictor</span><span class="p">(</span><span class="n">pretrained</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="c1"># Add a hook in the middle of the pipeline</span>
<span class="n">predictor</span><span class="o">.</span><span class="n">add_hook</span><span class="p">(</span><span class="n">my_hook</span><span class="p">)</span>
<span class="c1"># You can also add multiple hooks which will be executed sequentially</span>
<span class="k">for</span> <span class="n">hook</span> <span class="ow">in</span> <span class="p">[</span><span class="n">my_hook</span><span class="p">,</span> <span class="n">my_hook</span><span class="p">,</span> <span class="n">my_hook</span><span class="p">]:</span>
<span class="n">predictor</span><span class="o">.</span><span class="n">add_hook</span><span class="p">(</span><span class="n">hook</span><span class="p">)</span>
</pre></div>
</div>
</section>
</section>
</section>

Expand Down Expand Up @@ -1049,6 +1092,7 @@ <h3>What should I do with the output?<a class="headerlink" href="#what-should-i-
<li><a class="reference internal" href="#id2">Available architectures</a></li>
<li><a class="reference internal" href="#two-stage-approaches">Two-stage approaches</a></li>
<li><a class="reference internal" href="#what-should-i-do-with-the-output">What should I do with the output?</a></li>
<li><a class="reference internal" href="#advanced-options">Advanced options</a></li>
</ul>
</li>
</ul>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -360,6 +360,8 @@ <h1>Source code for doctr.models.detection.differentiable_binarization.tensorflo
<span class="sd"> ----</span>
<span class="sd"> feature extractor: the backbone serving as feature extractor</span>
<span class="sd"> fpn_channels: number of channels each extracted feature maps is mapped to</span>
<span class="sd"> bin_thresh: threshold for binarization</span>
<span class="sd"> box_thresh: minimal objectness score to consider a box</span>
<span class="sd"> assume_straight_pages: if True, fit straight bounding boxes only</span>
<span class="sd"> exportable: onnx exportable returns only logits</span>
<span class="sd"> cfg: the configuration dict of the model</span>
Expand All @@ -373,6 +375,7 @@ <h1>Source code for doctr.models.detection.differentiable_binarization.tensorflo
<span class="n">feature_extractor</span><span class="p">:</span> <span class="n">IntermediateLayerGetter</span><span class="p">,</span>
<span class="n">fpn_channels</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">128</span><span class="p">,</span> <span class="c1"># to be set to 256 to represent the author&#39;s initial idea</span>
<span class="n">bin_thresh</span><span class="p">:</span> <span class="nb">float</span> <span class="o">=</span> <span class="mf">0.3</span><span class="p">,</span>
<span class="n">box_thresh</span><span class="p">:</span> <span class="nb">float</span> <span class="o">=</span> <span class="mf">0.1</span><span class="p">,</span>
<span class="n">assume_straight_pages</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">True</span><span class="p">,</span>
<span class="n">exportable</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span><span class="p">,</span>
<span class="n">cfg</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">Any</span><span class="p">]]</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span>
Expand Down Expand Up @@ -407,7 +410,9 @@ <h1>Source code for doctr.models.detection.differentiable_binarization.tensorflo
<span class="n">layers</span><span class="o">.</span><span class="n">Conv2DTranspose</span><span class="p">(</span><span class="n">num_classes</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">strides</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">kernel_initializer</span><span class="o">=</span><span class="s2">&quot;he_normal&quot;</span><span class="p">),</span>
<span class="p">])</span>

<span class="bp">self</span><span class="o">.</span><span class="n">postprocessor</span> <span class="o">=</span> <span class="n">DBPostProcessor</span><span class="p">(</span><span class="n">assume_straight_pages</span><span class="o">=</span><span class="n">assume_straight_pages</span><span class="p">,</span> <span class="n">bin_thresh</span><span class="o">=</span><span class="n">bin_thresh</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">postprocessor</span> <span class="o">=</span> <span class="n">DBPostProcessor</span><span class="p">(</span>
<span class="n">assume_straight_pages</span><span class="o">=</span><span class="n">assume_straight_pages</span><span class="p">,</span> <span class="n">bin_thresh</span><span class="o">=</span><span class="n">bin_thresh</span><span class="p">,</span> <span class="n">box_thresh</span><span class="o">=</span><span class="n">box_thresh</span>
<span class="p">)</span>

<span class="k">def</span> <span class="nf">compute_loss</span><span class="p">(</span>
<span class="bp">self</span><span class="p">,</span>
Expand Down
Loading

0 comments on commit bcf6afc

Please sign in to comment.