-
Notifications
You must be signed in to change notification settings - Fork 583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Font name encoding issue #2971
Comments
Thanks for the quick reply. But why a pdf viewer can display the correct font name, can it be an upstream issue of mupdf? |
No, of course not. We also do not look into the font's binary at this point at all either. With text extract and font extraction itself, access to the font's self-identification is included, see here: import fitz
doc=fitz.open("sample.pdf")
ff=doc.extract_font(6)
font=fitz.Font(fontbuffer=ff[-1])
font
Font('STFangsong Regular') You can also do this to invoke Python capabilities to interpret page=doc[0]
item=page.get_fonts()[0]
fontname=item[3]
realname = bytes([ord(c) for c in fontname]).decode()
realname
'BCDEEE+华文仿宋' |
Thanks for the detailed explanation. Fully understand now. |
Description of the bug
In my test case below, the font name (with Chinese characters) seems encoded with error when extracted with
get_fonts()
orget_text('rawdict')
. Please look into it, thanks.How to reproduce the bug
sample.pdf
PyMuPDF version
1.23.8
Operating system
Windows
Python version
3.8
The text was updated successfully, but these errors were encountered: