-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add command to extract annotated pages #97
Comments
Hi @wolfram77 👋 In order to determine if
|
Hello @Lucas-C Thanks for considering my request. Yes, extracting only pages with annotations from a PDF would be useful, as it would help my guide and other professors filter out the pages they have commented on, especially when reviewing theses. As for adding this to pdfly, perhaps a shorter |
Alright. I think that keep the Would you like to submit a PR to implement this feature? 🙂 You will find some documentation on how to detect annotations using |
I tried the following, but it seems to include way too many pages than expected. input = PdfReader(str(input_pdf))
output = PdfWriter()
# Copy only the pages with annotations
for page in input.pages:
if "/Annots" in page:
output.add_page(page)
# Save the output PDF
output.write(output_pdf) |
That's a good start 🙂 👍 There are many kind of PDF annotations :
In order to distinguish between those, you will have to check their For more information, you can check the PDF specs: https://developer.adobe.com/document-services/docs/assets/5b15559b96303194340b99820d3a70fa/PDF_ISO_32000-2.pdf |
Hello pdfly contributors, I hope you're doing well. My advisor has repeatedly asked for a way to filter annotated pages from a PDF (thesis). I managed to find a solution using pymupdf, but having a CLI tool for this would be helpful. Any suggestions on how to integrate this feature into pdfly?
The text was updated successfully, but these errors were encountered: