Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 Support #78

Open
krupong opened this issue Jul 29, 2024 · 13 comments
Open

UTF-8 Support #78

krupong opened this issue Jul 29, 2024 · 13 comments

Comments

@krupong
Copy link

krupong commented Jul 29, 2024

I am testing the signature -> set_metadata_props feature , but it's not show correctly.
My signing reason is "ทดสอบ"

Screenshot_20240729_143316

Is it support utf-8 encoding?
Thank you.

@erikn69
Copy link
Contributor

erikn69 commented Jul 29, 2024

Try #79

@krupong
Copy link
Author

krupong commented Jul 30, 2024

Hello,I've try

Try #79

It's truncate some character such as "ภาษาไทย" will return "ภา".

ภาพ

So I've change from :

return "\xFE\xFF" . mb_convert_encoding($string, 'UTF-16BE', $encoding);

TO :

return "\xEF\xBB\xBF".mb_convert_encoding($string, 'UTF-8', $encoding);

It's show correctly.
ภาพ

Thank you.

@erikn69
Copy link
Contributor

erikn69 commented Jul 30, 2024

So I've change from :
return "\xFE\xFF" . mb_convert_encoding($string, 'UTF-16BE', $encoding);
TO :
return "\xEF\xBB\xBF".mb_convert_encoding($string, 'UTF-8', $encoding);

with that change I get this

image

@dealfonso
Copy link
Owner

What about using a custom encoded string when setting the metadata?

@erikn69
Copy link
Contributor

erikn69 commented Jul 30, 2024

What about using a custom encoded string when setting the metadata?

That would work, but there would be the problem that every time someone doesn't know that they should do their own encoding, they will have problems and open a new issue.

@erikn69
Copy link
Contributor

erikn69 commented Jul 30, 2024

@dealfonso One question, if the file says ANSI in the encoding, and the reason is in UTF-8 or another encoding, wouldn't this problem occur?

Look, I sent UTF-8 and it doesn't work

/Reason(ภาษาไทย)/Location(sdfs ó í í)>>

But I did send ISO-8859-1

/Reason(ó í í {} ` ~)/Location(sdfs ó í í)>>

@dealfonso
Copy link
Owner

Honestly, I have not considered this topic before. A quick search on google [1] tells me that PDF seems not to consider character encoding in a general form. It considers that the encoding depends on the font, and depending on the font, the same character will show a representation or another.

I don't know how this applies to the reason and so on.

That is why my "quick answer" is that pdf does not support utf-8 and so the users needs to encode the characters depending on their needs.

I'll read more about character encoding in the metadata. Do you have any source of info to read?

https://www.gnostice.com/nl_article.asp?id=383&t=Font_and_Encoding_Standard_types_supported_in_PDF_for_the_representation_of_text_content

@erikn69
Copy link
Contributor

erikn69 commented Jul 30, 2024

It considers that the encoding depends on the font, and depending on the font, the same character will show a representation or another

But on text contents, metadata don't use fonts

@erikn69
Copy link
Contributor

erikn69 commented Jul 30, 2024

I did try FPDF, and it works with UTF-8,

/Keywords (þÿ Ì + ^ ì ò Ò ê)

But here doesn't work
https://github.com/Setasign/FPDF/blob/0838e0ee4925716fcbbc50ad9e1799b5edfae0a0/fpdf.php#L1169C1-L1189C2

@krupong
Copy link
Author

krupong commented Jul 31, 2024

I try to sign with TCPDF, It work with UTF-8 too.
When open in VS-Code :

ภาพถ่ายหน้าจอ 2567-07-31 เวลา 11 24 39

Sign with sapp, seem store as plain text :
ภาพถ่ายหน้าจอ 2567-07-31 เวลา 11 30 07

@krupong
Copy link
Author

krupong commented Sep 21, 2024

I try [#79 ] by encode metadata to UTF-16BE with BOM, everything is OK.

Problem is, when I set string contain "\x0E\x28 " (ศ) or "\x0E\x29" (ษ) metadata is broken.
I think that when compile to PDF, there will be "(" or ")" in compiled character, causing the incorrect display.

For example, I set my string to "ภาษาไทย".
ภาพ

Will show like this:
ภาพ

I do dirty fixed by adding "(" or ")" at the beginning or end of string.

So I add "(" to beginning of "ภาษาไทย" like this -> "(ภาษาไทย".
ภาพ

Then my signature show like this:
ภาพ

It display the text correctly but still has "(" in front of it.

Is there a correct way to deal with this problem?

Thank you.

@angeljqv
Copy link

Feel free to make a PR with the fix 👍

@krupong
Copy link
Author

krupong commented Sep 21, 2024

Could you please check #84

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants