-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Extra content at the end of the document #161
Conversation
## Why? XML with multiple root elements is invalid. See: ruby#160 (comment)
641d9d1
to
4e9de51
Compare
## Why? XML declaration must be the first item. https://www.w3.org/TR/2006/REC-xml11-20060816/#document ``` [1] document ::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* ) ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-prolog ``` [22] prolog ::= XMLDecl Misc* (doctypedecl Misc*)? ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-XMLDecl ``` [23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' ``` See: ruby#161 (comment)
lib/rexml/parsers/baseparser.rb
Outdated
return [ :start_element, tag, attributes ] | ||
end | ||
else | ||
text = @source.read_until("<") | ||
if text.chomp!("<") | ||
@source.position -= "<".bytesize | ||
end | ||
if @tags.empty? and @have_root | ||
if text.strip != "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strip
allocates a new string. Can we avoid it?
For example: /\A\s*\z/.match?(text)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
I see.
## Why? XML with additional content at the end of the document is invalid. https://www.w3.org/TR/2006/REC-xml11-20060816/#document ``` [1] document ::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* ) ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-Misc ``` [27] Misc ::= Comment | PI | S ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PI ``` [16] PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>' ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PITarget ``` [17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l')) ```
4e9de51
to
c094825
Compare
## Why? XML declaration must be the first item. https://www.w3.org/TR/2006/REC-xml11-20060816/#document ``` [1] document ::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* ) ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-prolog ``` [22] prolog ::= XMLDecl Misc* (doctypedecl Misc*)? ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-XMLDecl ``` [23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' ``` See: ruby#161 (comment)
Thanks. |
## Why? XML declaration must be the first item. https://www.w3.org/TR/2006/REC-xml11-20060816/#document ``` [1] document ::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* ) ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-prolog ``` [22] prolog ::= XMLDecl Misc* (doctypedecl Misc*)? ``` https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-XMLDecl ``` [23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' ``` See: #161 (comment)
@naitoh @kou After this change, parsing of The main cause is this if statement at lib/rexml/parsers/baseparser.rb:498
|
|
@kou Well, I use socket to get xml messages from the server and parse them using PullParser. Each message is complete and valid. Before change it worked like a charm. Now it doesn't work anymore. |
OK. Could you open a new issue for it? |
Why?
XML with additional content at the end of the document is invalid.
https://www.w3.org/TR/2006/REC-xml11-20060816/#document
https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-Misc
https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PI
https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PITarget