-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SRI: One resource, many representations. #235
Comments
We did: https://lists.w3.org/Archives/Public/public-webappsec/2015Jan/0288.html At the time I concluded it could always be added later, is that not true anymore? |
I missed that thread, sorry. Thanks for dredging it up for me. So the thought would be that |
It's fairly possible I'm missing something, but wouldn't the problem with conneg and SRI be that the hashes of the different representations would be different? Do we permit multiple hashes that some may not match the returned content? (I guess I could just go read the SRI processing model...) |
Maybe I should read the processing model. When I originally wrote the thing, I think this was in my head as something to support, but I haven't checked up on things to see where we've evolved. :) As I noted, if you actively decide this isn't worth supporting, or is actively bad to support, great! Just note that explicitly in the spec somewhere. |
@mikewest yes. @yoavweiss you can already have multiple hashes due to different hashing algorithms. |
@annevk - from does resource match metadataList it seems like only the strongest hash is picked and validated against (so there can only be one ). I believe that if there are multiple "strongest" hashes, the first or last one (based on the UA specific priority function) will get picked. |
Right and if we ever wanted to support content negotiation you'd first filter on type and then on "strongest". |
I just wrote up that pull request to add a note about content negotiation. |
Wait, is there a reason why we're restricting this discussion to content-type negotiation? "One resource, many representations" is a much broader concept. For example, the server may return different representations of the same resource based on any number of advertised request headers... e.g. CH. To make this concrete, say I have an HTML resource which will vary based on advertised DPR, RW, and possibly MD request headers, how do I use SRI (or not?) for this asset? |
exactly. I think we should just change the algorithm to keep a list of hashes (still insisting on strongest hash function). |
To clarify, @devd, you're suggesting we keep a list of hashes, but only keep around the strongest hash algorithm? That is, if we have: We'd keep around both lists, but only That would sound reasonable to me, although it is technically independent of whether we should have separate option-expessions for each (but hey, why not if we're at that point?). |
yup, and I think the change is simple enough that we can even push it into v1. wdyt? |
Implementation-wise, it's not a huge change, but it's not a trivial one either. Considering the fact that v1 only has support for styles and scripts, are we going to run into problems if there is only a single valid hash value? If not, I'd suggest ensuring forward-compatibility in v1 and moving "list of hashes" to v2. Regarding the way that we currently specify content-type via a single option. Should we revert back to tying the content-type to each hash value (e.g. |
|
@annevk: the MIME type is currently used to enforce that the resource is served with that Content-Type response header. We have also talked about extending that enforcement to disable MIME type sniffing and ensure that the browser is parsing a script as a script for example. |
|
IMO, we need the type only for content-based caching schemes. Sniffing protection is nice to have, but @annevk is right that that use case is orthogonal. If you want to punt the whole caching question, you don't need the type. Just leave room for it, please. |
Pointer to content-based caching schemes and why they need a MIME type? |
@devd so if it is optional we do not need it for v1 and just need to make sure we can add it at some point. (Perhaps as a distinct feature and for hash-based caching you'd need to use both SRI and that new feature.) |
yeah, but I think this issue is asking exactly that: how do we add it at some point |
Well if at some point we decide that enforcing the MIME type of a resource on the client is important we add that feature, make it detectable somehow, and get it implemented. If it needs to be tied to the SRI hash for some reason we should make the processing model of the list of hashes such that either today's clients ignore the additional instruction or fail on them, depending on what we want going forward. |
so the current algorithm (Section 3.3.3) is intentionally very forgiving: the parsing involves a split on a space and skipping any token that is invalid and any algorithm that is invalid. In the future, we can add new types of tokens and we should be fine per the spec. |
It sounds like what we need to do is:
Given that In terms of providing a way to attach options to each hash, here are two possible ways to do it:
In the first option, we don't have to specify exactly how the options will be provided (that's a V2 feature) but we should specify that hashes can have options after the second hyphen and that they should be ignored for now. The second option may look familiar ;) but it's worth mentioning again since we're getting close to re-designing the same thing. Also it would be relevant if we're interested in caching optimizations in V2. |
agreed. I also vote for removing the global type option in v1. Re the attach options to each hash: in terms of usability for developers, I think the far more common case is that developers will attach all the possible hashes instead of figuring out which hash applies in which scenario and which option. Think a bit more about this though, I think we should consider adding the "keep list of strongest hashes" as part of the algorithm in v1. Given my assertion that providing multiple hashes will be more popular and simpler than specifying options for each hash, the current algorithm will make this tricky to adopt. The current algorithm means that browsers with v1 will just choose the first (or last) hash to compare with and so specifying multiple hashes will be a non-intuitive dance. (OR we create a new algorithm name for multiple hashes) and then worrying about how |
@devd your comment got partially dropped. If you allow multiple hashes, you need an algorithm for walking through them. E.g. picking the first strongest hash and using that, or attempting all strongest hashes... |
So, the algorithm in the spec already talks about how to deal with unknown hashes and how to pick strongest hash. As far as I can tell, we only need to modify it to keep a list of the strongest hashes instead of just picking the first (or last) one. Does that sound good to you (and everyone else on the thread)? Remove the global type option and tweak the algorithm to keep list of hashes; in v.next, we look into adding options per hash. |
I'm still missing something. Are you suggesting to keep a list and then see if one of them matches the resource (in some order)? That is fine with me. |
So, if you have
the current algorithm will just pick the first hash or the last hash. Then, it will do a hash of the contents and if it matches, succeed or fail. Instead, we want the algorithm to keep them all around, hash the content and see if any match. sound good? |
Yes, that allows for content negotiation of any kind. Combined with ignoring invalid tokens I think we have a forward compatibility story as well. |
I haven't heard anybody wanting to keep the global options, so I think we can assume consensus here. With respect to the list of hashes and per-hash options, I think we also have consensus that we don't need either in V1 (given we only support scripts and styles) but that we need to ensure forward-compatibility. In terms of forward compatibility, my assumptions are that:
Are there assumptions correct? |
Re point 2: |
Sorry for my delay in responding (Passover got in the way). Basically, I'm pretty neutral what we do for v1. Implementation-wise, none are a huge burden for Chrome, so we'll do whatever. I guess I have a slight preference for "multiple hashes, figure out options later" just because it seems like there are fewer edge cases to work out. In either was, as @devd said, we should have "keep a list" in v1 though. |
Since I've already got that pull request going, I'll start writing up a new one that:
Does that make sense? |
Sgtm
|
Are we positive we're not going to need per-hash options? (e.g. per-hash MIME type) Because we're not likely to be able to add them later unless we specify a separator of some kind in v1. |
I don't think it's accurate that we will not be able to add them later (I certainly think we will, in fact). As @devd wrote above, The spec leaves us with plenty of room to separate a new "type option" prefix or sufficx from the hash algorithm and hash value. For example, as a strawman, starting with "?text/javascript:" as a way of expressing type options would be a valid extension of the current syntax. |
Yeah, I'm not worried about global options, they are easy to add back later if we need them, given that we just ignore unrecognized tokens. I guess I was thinking that if we ever want to go from:
to your strawman:
and have it work (w/o MIME type checking) in old implementations then we need to ensure that the second one is valid in v1 too (i.e. adding it to the ABNF productions). |
Fair point. It seems straightforward to add them into the grammar now. How about I rewrite integrity-metadata to: integrity-metadata = _WSP hash-expression [ ":" option-expression ] *(1_WSP hash-expression [":" option-expression] *WSP / *WSP which should allow expressions of the type: but we can separately debate whether to actually content-type separately? |
@metromoxie That seems fine to me. I'm not sure we're going to need both a prefix and a suffix. So perhaps all we need to add are the productions to get |
hmm .. won't
get ignored by a UA and thus not get enforced? |
I believe that would technically be a parse failure. "?" is not in the grammar, and hash-expression would therefore fail parsing. If you prefer the "?" syntax, I'm happy to make that the separator for type as we had before. |
I just updated my pull request (#236) to add per-hash options and remove MIME types. Also, I decided I like "?" postfix syntax, so I rewrote the grammar to use it. |
hmm .. my reading of the spec is that it splits the integrity metadata on spaces and then if it isn't "valid metadata" then that token is skipped. Whats the difference between valid metadata and parse failure? |
Well, that would mean that the "sha256-deadbeef?text/css" token is skippend, and I believe the idea is we want that to parse in v1, but only be meaningful in v2. I suppose that doesn't matter as much now that we will support multiple hashes (so you could have both "sha256-deadbeef?text/css" and "sha256-deadbeef"), but it seems a little silly to me to require the same hash twice. |
ok .. that's fine if we want to do that extra work (we did that for CSP uris). In general, I wanted to make sure that any new style token we introduced won't cause old browsers to break. I am fine with skipping for that reasons: "you are protected if you use a recent browser" is a fine thing for any webapp to say. |
I think we're in agreement then, but if you can review my pull request and merge it if it looks good, I'll take that as the ultimate agreement :-) |
heh .. yeah. I am more concerned that the current spec is future compatible so wanted to make sure that current implementations aren't breaking down when they see something they don't know. |
Doc: link to public URL for biblio.ref
Given the prevalence of content negotiation schemes like https://www.igvita.com/2013/05/01/deploying-webp-via-accept-content-negotiation/, I worry a bit about the current MIME type syntax for integrity metadata. The previous syntax allowed each integrity metadata tuple to contain a distinct MIME type, which would, for instance, allow a server to return a WebP image rather than a PNG one for performance reasons if the UA supported it.
The current scheme has a single type, which applies to every algorithm/digest pair.
This might or might not be a reasonable limitation, but I'd like to see it explicitly considered, as I don't think I've seen any discussion of this impact.
/cc @igrigorik @yoavweiss @metromoxie @fmarier @mozfreddyb @devd
The text was updated successfully, but these errors were encountered: