The positional information extracted through the Page.get_text('words', flags=fitz.TEXT_INHIBIT_SPACES) method has a deviation.

_**Please provide all mandatory information!**_

## Describe the bug (mandatory)
The positional information extracted through the Page.get_text('words', flags=fitz.TEXT_INHIBIT_SPACES) method has a deviation.

## To Reproduce (mandatory)
[words_test.pdf](https://github.com/pymupdf/PyMuPDF/files/13315186/words_test.pdf)
![image](https://github.com/pymupdf/PyMuPDF/assets/22074904/394068d7-c85e-4e48-b062-6962404b278b)
![image](https://github.com/pymupdf/PyMuPDF/assets/22074904/2cd99b6e-b8b2-4636-bcaa-42a8f189686b)

pymupdf version is 1.23.5

The code belows can reproduces the bug

`   document = fitz.open('data/word_test.pdf')
    page = document.load_page(0)
    words = page.get_text('words', flags=fitz.TEXT_INHIBIT_SPACES)
    for word in words:
        rect =fitz.Rect(word[0], word[1], word[2], word[3])
        color = (0, 1, 0)
        page.draw_rect(rect, color)
    document.save('word_test_new.pdf')
`

The text boxes extracted through the Page.get_text('words', flags=fitz.TEXT_INHIBIT_SPACES) contain some abnormal blocks that seem much larger than I anticipated. Is there room for optimization that I might be missing?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The positional information extracted through the Page.get_text('words', flags=fitz.TEXT_INHIBIT_SPACES) method has a deviation. #2796

Describe the bug (mandatory)

To Reproduce (mandatory)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The positional information extracted through the Page.get_text('words', flags=fitz.TEXT_INHIBIT_SPACES) method has a deviation. #2796

Description

Describe the bug (mandatory)

To Reproduce (mandatory)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions