Closed
Description
Please provide all mandatory information!
Describe the bug (mandatory)
The positional information extracted through the Page.get_text('words', flags=fitz.TEXT_INHIBIT_SPACES) method has a deviation.
To Reproduce (mandatory)
pymupdf version is 1.23.5
The code belows can reproduces the bug
document = fitz.open('data/word_test.pdf') page = document.load_page(0) words = page.get_text('words', flags=fitz.TEXT_INHIBIT_SPACES) for word in words: rect =fitz.Rect(word[0], word[1], word[2], word[3]) color = (0, 1, 0) page.draw_rect(rect, color) document.save('word_test_new.pdf')
The text boxes extracted through the Page.get_text('words', flags=fitz.TEXT_INHIBIT_SPACES) contain some abnormal blocks that seem much larger than I anticipated. Is there room for optimization that I might be missing?