Description
Description of the bug
Get text by page.get_text("dict")["blocks"], the positions (span's bbox ) of regular text and normal symbols are correct. However, the position of the square root symbol is noticeably incorrect. Please refer to the following debug information and the rectangles I redrew on the original PDF page based on the obtained bbox. It can be observed that the bbox positions are off by nearly 1 line-height to the actual positions.
Debug message:
.....
DEBUG: PDF Span:(query with all keys, divide each by) bbox:[108.0, 391.57, 247.56, 401.57] fonts:NimbusRomNo9L-Regu
DEBUG: PDF Span:(√ ) bbox:[250.05, 383.49, 258.35, 393.46] fonts:CMSY10
DEBUG: PDF Span:(d) bbox:[258.35, 391.36, 263.54, 401.33] fonts:CMMI10
DEBUG: PDF Span:(k) bbox:[263.54, 395.19, 267.77, 402.17] fonts:CMMI7
.....
Image : bbox of span and original pdf
How to reproduce the bug
TEST PDF File : https://arxiv.org/pdf/1706.03762
Test Page: P.4
PyMuPDF version
1.25.1
Operating system
MacOS
Python version
3.10