Skip to content

Font name encoding issue #2971

Closed
Closed
@dothinking

Description

@dothinking

Description of the bug

In my test case below, the font name (with Chinese characters) seems encoded with error when extracted with get_fonts() or get_text('rawdict'). Please look into it, thanks.

How to reproduce the bug

doc = fitz.Document('sample.pdf')
doc[0].get_fonts()

# output:
#[(6,
#  'ttf',
#  'TrueType',
#  'BCDEEE+å\x8d\x8eæ\x96\x87仿å®\x8b',    <- from PDF Viewer, the name should be 华文仿宋
#  'F1',
#  'WinAnsiEncoding')]

sample.pdf

PyMuPDF version

1.23.8

Operating system

Windows

Python version

3.8

Metadata

Metadata

Assignees

No one assigned

    Labels

    not a bugnot a bug / user error / unable to reproducewontfixno intention to resolve

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions