Big5 string not detected #169
Unanswered
primalspacesystems
asked this question in
Q&A
Replies: 1 comment
-
Probably it uses the same algorithm under the hood.
It's indeed a best guess. But please check the other results, as it could give multiple detected encodings. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm doing some work with 9 different encodings in some unit tests - Windows-1252, shift_jis, windows-1250, windows-1251, windows-1253, windows-1255, windows-1256, ks_c_5601-1987, and big5.
Basically, I use Google Translate to localize 'The quick brown fox jumps over the lazy dog' into the appropriate language, save with the specific encoding and also save as UTF8. I then read in the file saved with the specific encoding with the encoding detected by UTFUnknown and save as UTF8. Finally the originally saved UTF8 is compared with the newly saved UTF8 file. 8 are the same (as expected), the big5 string isn't.
The test string is: "Test characters:敏捷的棕色狐狸跳過了懶狗。"
Notepad++ doesn't autodetect the big5 encoding either - but it doesn't detect Windows-1250 or Windows-1256 either.
Any tips? Is this expected? I know detecting encoding isn't an exact science, but it seems to work quite well. Is this a flaw in the algorithm?
Cheers
John
Beta Was this translation helpful? Give feedback.
All reactions