page.search_for not working for all text-part in pdf-document? #3838
Replies: 3 comments
-
Weird! print(page.get_text())
Compliant
Scan Results for West School District
June 11, 2024
June 11, 2024
Compliant
The Chemical Hygiene Plan from West School isn’t OSHA compliant.
Your document is 82.35% aligned wi
th OSHA standards. Let’s change that!
46
We found 46 issues in your document
A1
A2
Is the goal of the Chemical Hygiene Plan described?
Is the Chemical Hygiene Plan reviewed annually?
Yes
B. Chemical Hygiene Personnel
No
B2
Is the goal for Chemical Hygiene Personnel described?
Yes
This audit is subjected to terms of service.
No If you remove this "structure" information using low-level API, everything behaves as expected. cat = doc.pdf_catalog()
doc.xref_set_key(cat, "StructTreeRoot", "null") # remove the structure info
# now proceed as normal
page = doc[...] |
Beta Was this translation helpful? Give feedback.
-
I submitted a bug report on MuPDF's system here: https://bugs.ghostscript.com/show_bug.cgi?id=708005 |
Beta Was this translation helpful? Give feedback.
-
Works great again - thx a lot! |
Beta Was this translation helpful? Give feedback.
-
Hello - i have the attached source-file test.pdf and i try to change the appearances of
["A1", "A2", "B1", "B2", "P1", "I1", "C1", "C2"]
in the pdf-document using the following code
and the output goes to out.pdf
As you can see in the output it seems that i can´t find the entries,
B1, C2, P1, I1
using page.search_for
Why are this strings not found - but the other ones (A1, A2, B2, C1) are found?
Code:
out.pdf
test.pdf
Beta Was this translation helpful? Give feedback.
All reactions