-
-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parsing behavior related to a vulnerability #3351
Comments
TL;DR (1) I think there's a spec bug around I'm not surprised that documents with errors cannot round-trip through parsing and serialization. Even documents without errors may not (e.g.,
ParsingAs I read the standard, what should happen is
I believe that matches the output you gave.
Placing
I'm not sure what's supposed to happen when a foreign element is specified inside a table. The problem is that the
I agree.
I don't think foster parenting what's happening actually. When the
Insertion mode 8 is "in table" which seems strange because the
I think this is being handled correctly. nokogiri/gumbo-parser/src/parser.c Line 3581 in 4cceeac
SerializingAccording to Firefox, the first fragment serializes as <math><mtext><mglyph><style><img src=: onerror=alert(1)>
</style></mglyph><table></table></mtext></math> which is similar to, but not exactly what Nokogiri is giving us: <math><mtext><mglyph><style><img src=: onerror=alert(1)></style></mglyph><table></table></mtext></math> ( As you note, this fragment is parsed with the
|
@stevecheckoway Thanks for the thoughtful response.
I'm not, either, and I didn't mean to imply an expectation that they should roundtrip, just highlighting that round-tripped is necessary to see the vulnerability.
Understood. Maybe if I was going to roll up my long initial post, it would be "I think this should be a The spec says
and it doesn't make sense to me that this doesn't take into account the insertion mode that would normally be demanded by the foster parent. §13.2.6 appears to say the token should be processed in foreign content -- it's the |
Yeah, I agree that this seems like the most reasonable behavior but note that something like I was going to test how browsers handle this, but no browser actually supports
Yes, I think that's the bug in the spec. |
Sure, I understand the use case in this example is a subtree that's parse as HTML (not MathML) and I get it.
LOL 🤷
OK, so just to make sure I understand, let me try to rephrase what I think we agree on ... The act of foster parenting If that's a rough statement that we think should be true and reflected in the spec ... what do you think the next steps should be?
|
I think the behavior you describe is the best option with respect to serializing and reparsing. The current behavior is self-consistent in that any HTML element appearing between the
I think it makes sense to open an issue and ask what the expected behavior is here.
My inclination is the parser should match the standard. Can you say more about the vulnerability? I assume this comes from parsing, validating, and serializing attacker-controlled HTML and the problem is that the DOM produced by serializing is different from the DOM used in validation. If that's the case, then I think the best approach is to
For security, I'd also ensure that the serialization of the DOM from step 3 matches the serialization from step 2. I realize this is not ideal since parsing needs to happen twice and (if the security step is followed), serialized twice. One of the serializations can be skipped by comparing the DOMs produced in steps 1 and 3 directly, but that's more complicated and error-prone. |
One thought along these lines. Gumbo currently has flags per node tracking a few things including whether a node was foster-parented out of a table, moved by the adoption agency algorithm, cloned by the adoption agency algorithm, created due to reconstructed formatting elements, and a few other things that seem less important. Currently, nothing uses these flag. As the gumbo source has evolved, I don't think close attention was paid to setting these flags when creating nodes. One option would be to track at the document level whether a node was adopted, reparented, etc. and expose that information to Ruby. Then, the 5 steps I gave above can be simplified since the DOM from step 1 should match the DOM from step 3 so long as no nodes were adopted or foster parented. Going this route would require
Step 1 seems easy (without actually checking). Step 2 involves finding every place foster parenting can occur and testing. Step 3 is probably easy, but I don't have experience with libFuzzer. I've only used AFL to fuzz gumbo and that was long enough ago, that I'm not sure I even have the infrastructure saved somewhere. |
The original post mimics the full lifecycle of Rails sanitizing an untrusted input, and a user browser parsing the serialized result. The first parse is of the untrusted content. Once parsed, Rails::HTML::Sanitizer traverses the DOM looking for disallowed tags and attributes, and removes them. An attribute like Once serialized by Rails and then parsed by a user's browser (the second parse), Reparsing may sufficient for the attack described above; but I suspect it's possible to craft attack payloads with subtrees recursively nested within I'd prefer to just prune big pieces of the DOM than add complexity to the sanitizer simply to try to optimally handle these kinds of cases. The attack vectors only exist if developers override the default allowlist and allow One idea I want to explore is for the sanitizer to remove any "style" tag who has an ancestor element that is foreign content (i.e., that has a namespace). Or, another idea is to simply throw our hands up and say "look, if you override the defaults to allow (( |
(While writing my response, I missed your most recent response above) I like the idea of tracking more detailed parse state (like "fostered"), but that feels like a much bigger lift than I think we should put into this particular vulnerability type. Like I mentioned above, the configurations required are not realistic and I personally question whether it's worth the effort (by us) to mitigate risks with a scalpel that we could more easily mitigate with a machete. |
Now that rails-html-sanitizer v1.6.1 has shipped a mitigation for this vulnerability, I've moved the issue into the public repository so the conversation is public. |
Note from maintainers: the potential vulnerability described below has been mitigated in rails-html-sanitizer v1.6.1. This issue was private until 2024-12-04 when it was moved into the public Nokogiri repository.
@stevecheckoway, I wanted to discuss a thing that came up in the context of a vulnerability report against Rails, that I think is either a problem with the spec or a bug in the gumbo parser (and I'm honestly not sure which).
Here's an explanation of what's going on, as concisely as I can explain:
The reason this is contributing to a security problem is that we can't round-trip that DOM through the parser. Continuing the above snippet ...
And now the
img
tag's onerror handler will run in a browser.I have a few questions here ...
1. mglyph foster parenting in parse 1
In the first parse, when mglyph is foster-parented, is placing it under
mtext
the correct behavior? The spec says in § 13.2.6.4.9:So I think the spec is saying "parse this as HTML content even though the parent is mtext" and so maybe this is a bug in the spec?
The spec also describes how to find the target for foster parenting in § 13.2.6.1, and that guidance seems to also say "use mtext as the parent node" even though it's got very different parsing rules.
In fact,
mtext
appears in the list of nodes that break the "have a particular element in scope" chain in § 13.2.4.2. Shouldn't the spec include a check for "having an HTML element in scope" in the foster parenting instructions?This seems like the parser is following the spec correctly, but the spec is incomplete. WDYT?
I can imagine two possible solutions:
2. style foster parenting in parse 1
It seems like the
style
tag is then also foster parented under the newmglyph
element.I don't know whether this is right or not. The foster parenting instructions say "... and then disable foster parenting" but it seems like gumbo didn't disable it? Maybe I'm misreading the spec.
The text was updated successfully, but these errors were encountered: