Skip to content

Commit

Permalink
get_text_range: slightly enhance docs
Browse files Browse the repository at this point in the history
  • Loading branch information
mara004 committed Nov 29, 2023
1 parent 5886522 commit 6a0a67b
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions src/pypdfium2/_helpers/textpage.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,13 +62,12 @@ def get_text_range(self, index=0, count=-1, errors="ignore"):
Returns:
str: The text in the range in question, or an empty string if no text was found.
Important:
The returned text's length does not have to match *count*, even if it will for most PDFs.
This is because the underlying API may exclude/insert chars compared to the internal list, although rare in practice.
This means, if the char at ``i`` is excluded, ``get_text_range(i, 2)[1]`` will raise an index error.
Pdfium provides raw APIs ``FPDFText_GetTextIndexFromCharIndex() / FPDFText_GetCharIndexFromTextIndex()`` to translate between the two views and identify excluded/inserted chars.
Note:
In case of leading/trailing excluded characters, pypdfium2 modifies *index* and *count* accordingly to prevent pdfium from unexpectedly reading beyond ``range(index, index+count)``.
* The returned text's length does not have to match *count*, even if it will for most PDFs.
This is because the underlying API may exclude/insert chars compared to the internal list, although rare in practice.
This means, if the char at ``i`` is excluded, ``get_text_range(i, 2)[1]`` will raise an index error.
Pdfium provides raw APIs ``FPDFText_GetTextIndexFromCharIndex()`` / ``FPDFText_GetCharIndexFromTextIndex()`` to translate between the two views and identify excluded/inserted chars.
* In case of leading/trailing excluded characters, pypdfium2 modifies *index* and *count* accordingly to prevent pdfium from unexpectedly reading beyond ``range(index, index+count)``.
"""

if count == -1:
Expand All @@ -78,6 +77,7 @@ def get_text_range(self, index=0, count=-1, errors="ignore"):
if active_range == 0:
return ""

# NOTE since we have converted indices from char to text, they will shift accordingly for inserted/excluded chars, so this will calculate the exact output size
t_start, t_end, l_passive, r_passive = active_range
index += l_passive
count -= l_passive + r_passive
Expand Down

0 comments on commit 6a0a67b

Please sign in to comment.