Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve how PDFHandler caches single page of pdf. #487

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

stonyw
Copy link

@stonyw stonyw commented Jan 31, 2024

Change the way to share and clean up temp directory.

The WITH clause is contagious. Temp directory cannot be shared across an instance of PDFHandler unless the signature of init is changed. It turns the upper layer's duty to clean up the directory.
To hide the implementation details, use finalizers to clean up.

Add _get_temp_path to make sure to access tmp pdf file in the same way.

Hide implementation details. We can reuse the temp pdf after calling parse() now.

Update _save_page parameters to meet the change.

Use properties instead.

Add _get_temp_path to make sure to access tmp pdf file in the same way.
Update _save_page parameters to meet the change.
@MartinThoma
Copy link
Contributor

Hey!

As camelot is dead, we try to build a maintained fork at pypdf_table_extraction.

Do you want to open the PR against that branch so that we can merge your improvement?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants