-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] HttpCollectImpl XML parsing assumes UTF-8 #2852
Comments
The underlying issue is more that you convert to a String in the first place. Lines 135 to 139 in 1fe70a6
Using an InputStream or a byte array should be more more efficient than a String. It would definitely not be worse. |
Sorry for missing that, got it thanks, we will consider parsing directly from the bytes stream later, just like described in |
@tomsun28 |
of course, welcome. |
@tomsun28 String resp = EntityUtils.toString(response.getEntity(), StandardCharsets.UTF_8); to InputStream inputStream = response.getEntity().getContent(); would require some significant changes to the code. |
The parseResponseBySiteMap method cannot readily be changed to read an InputStream due to its failover code (to handle non-XML). Lines 273 to 284 in 4009525
InputStreams can only be read once while Strings can be used as input more than once. |
Is there an existing issue for this?
Current Behavior
hertzbeat/hertzbeat-collector/hertzbeat-collector-basic/src/main/java/org/apache/hertzbeat/collector/collect/http/HttpCollectImpl.java
Line 251 in 1fe70a6
If you have a String, you don't need to convert to byte array (which is almost a waste of memory).
DocumentBuilder has a parse(InputSource) method.
https://docs.oracle.com/javase/8/docs/api/javax/xml/parsers/DocumentBuilder.html#parse-org.xml.sax.InputSource-
InputSources can be constructed to wrap StringWriters that wrap the String.
Expected Behavior
Don't convert Strings to byte arrays unnecessarily wasting memory and causing parse issues. Imagine if the XML has an XML declaration that has an encoding that is not UTF-8. If you already have the String, the parser will ignore the value. If you convert to a byte array, the parser will use the XML encoding value but you have explicitly converted to UTF-8 in your code so these encodings may not match.
Steps To Reproduce
No response
Environment
Debug logs
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: