Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Warn on unicode decoding errors in PDF annotations #1195

Open
wants to merge 1 commit into
base: stable
Choose a base branch
from

Conversation

stolarczyk
Copy link

@stolarczyk stolarczyk commented Aug 30, 2024

In certain scenarios, annotations may include invalid or extraneous data that can obstruct the annotation processing workflow. To mitigate this, the warn_unicode_error parameter in the PDF initializer and the .open() method provides a configurable option to bypass these errors and generate warnings instead, ensuring smoother handling of such anomalies.

Example warning, if activated:

UserWarning: Could not decode contents for annotation. Annotation contents will be missing.

in some cases the the annotations may contain some junk that hinders annotations processing altogether. This allows to ignore the error and warn instead, which is configurable via warn_unicode_error arguments in the PDF initializer and/or open() method.
@jsvine
Copy link
Owner

jsvine commented Sep 9, 2024

Hi @stolarczyk, and thank you for this suggestion. It makes sense to provide such warnings, although I'd lean toward a more generalizable approach rather than specifying parameters for each type of warning. To that end, I'm more inclined to use Python's built-in warning filtering. I'm open to other opinions, though. What do you think?

@stolarczyk
Copy link
Author

stolarczyk commented Sep 10, 2024

thanks for looking into this, @jsvine!

I’m not entirely clear on how your idea regarding built-in warning filtering would address the issue I'm focusing on. The proposed change turns an exception (UnicodeDecodeError) into a warning, which prevents the PDF processing from crashing entirely. So the warning is the result, not the issue at hand.

@jsvine
Copy link
Owner

jsvine commented Oct 3, 2024

My apologies for the misunderstanding! I think the name of the proposed parameter threw me off, but I also should have looked more closely. I think I understand it now. This proposal makes sense, though what about tweaking the name?: raise_unicode_errors=True/False?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants