bullet point content overflowing to next page in the PDF #4240
Unanswered
krish-tech02
asked this question in
Looking for help
Replies: 1 comment 1 reply
-
@JorjMcKie can you please help? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Uploading Arthralgia 01-09-2025-bullet-overflow-to-next-page.pdf…
In the attached PDF, on page 1, the last bullet point starts with "Warmth or redness:". The content overflows and continues onto the next page. I am using the PyMuPDF library and the
page.get_text("blocks", flags=1+2+8)
method to extract PDF content and convert it into HTML.I want to wrap each bullet point in an
<li>
tag, but since the last bullet point's content spans two pages, it gets extracted as separate blocks on different pages. Is there a way to identify that the content on the next page belongs to the same bullet point from the previous page? I considered using thex
andy
coordinates, but they don't seem to change enough to differentiate between a continued bullet point and new paragraph content.Could someone please help me figure out how to handle this?
Beta Was this translation helpful? Give feedback.
All reactions