Skip to content

Commit

Permalink
text_extractor: make decoding more robust
Browse files Browse the repository at this point in the history
  • Loading branch information
mara004 committed May 31, 2022
1 parent 62f4386 commit ff13d2f
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/pypdfium2/_helpers/text_extractor.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ def get_text(self, left=0, bottom=0, right=0, top=0):

c_array = (ctypes.c_ushort * (n_chars+1))()
pdfium.FPDFText_GetBoundedText(*args, ctypes.cast(c_array, ctypes.POINTER(ctypes.c_ushort)), n_chars)
text = bytes(c_array).decode("utf-16-le")[:-1]
text = bytes(c_array).decode("utf-16-le", errors="ignore")[:-1]

return text

Expand Down

0 comments on commit ff13d2f

Please sign in to comment.