-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSConvert
outputs wrong encoding for Waters raw files
#3186
Comments
What element/attribute had the non-ASCII character? Filepaths should be UTF-8 encoded and XML ids and idrefs should be xHHHH encoded. |
This actually only occurs in the id and idrefs for the file I am looking at. For instance: It looks like no |
Interesting. Looks like Visual C++'s isalpha() function says µ is an alphabetic character. But their example code implies it's not: https://learn.microsoft.com/en-us/cpp/c-runtime-library/character-classification?view=msvc-170
Looks like I need to go for the simpler approach. |
We are using
MSConvert
to convert Watersraw
files tomzML
. Unfortunately we have been experiencing issues with downstream processing. The issues seem to be caused by the encoding of themzML
files.The
mzML
files are encoded withWindows-1252
but in the headerUTF-8
is reported:This causes our XML-Parser to assume
UTF-8
and fail when running into non-ASCII characters (in our caseµ
/b5
was problematic).I would suggest to either convert the files to
UTF-8
or report the correct encoding in the header (I could not find any requirements forUTF-8
in themzML
specification).The text was updated successfully, but these errors were encountered: