-
-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML5 namespaces do not propagate to parent nodes when adding nodes to a document #2647
Comments
Hi, thanks for opening this issue. What's going on with the namespaces?This code snippet clarifies what's going on: #! /usr/bin/env ruby
require "nokogiri"
svg = <<~SVG
<div>
<svg height="100" width="100">
<circle cx="50" cy="50" r="40" stroke="black" stroke-width="3" />
</svg>
</div>
SVG
parsed_doc = Nokogiri::HTML5.parse(<<~HTML)
<html>
<body>
<section>
#{svg}
</section>
</body>
</html>
HTML
assembled_doc = Nokogiri::HTML5.parse(<<~HTML)
<html>
<body>
<section>
</section>
</body>
</html>
HTML
assembled_doc.at_css("section").children = svg
parsed_doc.at_css("circle").ancestors.map do |a|
[a.name, (a.namespace_definitions rescue nil)]
end
# => [["svg", []],
# ["div", []],
# ["section", []],
# ["body", []],
# ["html",
# [#(Namespace:0x3c {
# prefix = "svg",
# href = "http://www.w3.org/2000/svg"
# })]],
# ["document", nil]]
assembled_doc.at_css("circle").ancestors.map do |a|
[a.name, (a.namespace_definitions rescue nil)]
end
# => [["svg", []],
# ["div",
# [#(Namespace:0x50 {
# prefix = "svg",
# href = "http://www.w3.org/2000/svg"
# })]],
# ["section", []],
# ["body", []],
# ["html", []],
# ["document", nil]] The thing I'd like to point out here is that
When the libxml2 tree is created in Why this impacts an xpath query without explicit namespacesThe reason
The reason So: you've still got a way to search properly! Possible alternative behaviorsThe topic of what to do with namespaces during reparenting has been a 🔥 hot 🔥 topic over the years, and unfortunately the specifications do not provide any guidance. We've tried to establish some behavior and implement it as consistently as we can. I will very readily admit that some of the decisions we made may have been wrong; see #1200 for an example of behavior unrelated to this issue that I'd like to change. Now: I don't think we're obviously doing the wrong thing here. I would be comfortable closing this and saying "behaving as designed," particularly since you can explicitly pass the relevant namespaces to the But I also think that because there are a finite number of legal namespaces in HTML5 that there's an opportunity to step back and ask if there are some assumptions we can make to make this more user-friendly for the HTML5 use case. Idea 1: when querying HTML5 with xpath, always implicitly include the three legal namespacesWhat if xpath queries on HTML5 documents implicitly used the following namespaces: {
"svg" => "http://www.w3.org/2000/svg",
"math" => "http://www.w3.org/1998/Math/MathML",
"xlink" => "http://www.w3.org/1999/xlink",
} Then users might never have to specify namespaces, and this would work in all of the cases above: base.xpath(".//svg:circle")
base.xpath(".//math:mrow")
base.xpath(".//@xlink:href") Idea 2: when reparenting a node with any of these namespaces, copy (move?) them to the document rootAs mentioned above, when the libxml2 tree is created in We could extend this behavior to reparenting so that, when a node is reparented into an HTML5 document, any of these namespaces will be copied (or moved?) up to the document root. Then the current behavior of I don't feel strongly about either of these approaches. Maybe @stevecheckoway can weigh in? |
Thank you for the detailed and insightful answer. And I didn't know about explicitly providing the namespaces to xpath, so thank you for fixing my ignorance. I like your idea 1. With this it would be possible to query for xpath |
I like option 1 better, too (after sleeping on it), but I really would like Steve's feedback because he's gone deeper on HTML5 foreign element namespaces than I have. |
@stevecheckoway What do you think of option 1 above, implicitly including the |
After parsing a document that contains
<svg>
elements, it's possible to traverse the elements withxpath(".//svg:svg")
But if we have a document with no
<svg>
elements to which we then add<svg>
elements, the svg namespace is not added to the document, so it's impossible to use the above xpath.To illustrate:
Output:
As we can see above, even though doc1 and doc2 have the same structure,
doc2.namespaces
returns empty, and namespaced xpath queries result in an error for doc2, even for the div element that claims to have the namespaces.Now, it's probably better anyway to use
css("svg")
instead ofxpath(".//svg:svg")
. But I don't think there's an alternative toxpath(".//@xlink:href")
; at leastcss("[xlink:href]")
results in Nokogiri::CSS::SyntaxError.The text was updated successfully, but these errors were encountered: