You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Below, I include the boiled-down calls. My true testing data sample includes properly formatted XML; but through testing I found that having more and more text does not affect the confidence or output of the "jschardet.detect()" call.
With 1, 2, or 3 degree symbols, it detects as windows-1252 (which parses with an extra \xc2 for each, since it's supposed to be UTF-8)
jschardet.detect('\xc2\xb0');
With 4 degree symbols, it detects as EUC-KR
jschardet.detect('\xc2\xb0\xc2\xb0\xc2\xb0\xc2\xb0');
The text was updated successfully, but these errors were encountered:
The issue I'm having is because of the degree symbol:
UTF-8 \xc2\xb0
http://www.fileformat.info/info/unicode/char/b0/index.htm
Below, I include the boiled-down calls. My true testing data sample includes properly formatted XML; but through testing I found that having more and more text does not affect the confidence or output of the "jschardet.detect()" call.
With 1, 2, or 3 degree symbols, it detects as windows-1252 (which parses with an extra \xc2 for each, since it's supposed to be UTF-8)
jschardet.detect('\xc2\xb0');
With 4 degree symbols, it detects as EUC-KR
jschardet.detect('\xc2\xb0\xc2\xb0\xc2\xb0\xc2\xb0');
The text was updated successfully, but these errors were encountered: