Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SRI: One resource, many representations. #235

Open
mikewest opened this issue Apr 2, 2015 · 48 comments
Open

SRI: One resource, many representations. #235

mikewest opened this issue Apr 2, 2015 · 48 comments
Labels
Milestone

Comments

@mikewest
Copy link
Member

mikewest commented Apr 2, 2015

Given the prevalence of content negotiation schemes like https://www.igvita.com/2013/05/01/deploying-webp-via-accept-content-negotiation/, I worry a bit about the current MIME type syntax for integrity metadata. The previous syntax allowed each integrity metadata tuple to contain a distinct MIME type, which would, for instance, allow a server to return a WebP image rather than a PNG one for performance reasons if the UA supported it.

The current scheme has a single type, which applies to every algorithm/digest pair.

This might or might not be a reasonable limitation, but I'd like to see it explicitly considered, as I don't think I've seen any discussion of this impact.

/cc @igrigorik @yoavweiss @metromoxie @fmarier @mozfreddyb @devd

@mikewest mikewest added the SRI label Apr 2, 2015
@annevk
Copy link
Member

annevk commented Apr 2, 2015

We did: https://lists.w3.org/Archives/Public/public-webappsec/2015Jan/0288.html At the time I concluded it could always be added later, is that not true anymore?

@mikewest
Copy link
Member Author

mikewest commented Apr 2, 2015

I missed that thread, sorry. Thanks for dredging it up for me.

So the thought would be that type:XXX's position determines which items it applies to?

@yoavweiss
Copy link
Contributor

It's fairly possible I'm missing something, but wouldn't the problem with conneg and SRI be that the hashes of the different representations would be different? Do we permit multiple hashes that some may not match the returned content? (I guess I could just go read the SRI processing model...)

@mikewest
Copy link
Member Author

mikewest commented Apr 2, 2015

Maybe I should read the processing model. When I originally wrote the thing, I think this was in my head as something to support, but I haven't checked up on things to see where we've evolved. :)

As I noted, if you actively decide this isn't worth supporting, or is actively bad to support, great! Just note that explicitly in the spec somewhere.

@annevk
Copy link
Member

annevk commented Apr 2, 2015

@mikewest yes. @yoavweiss you can already have multiple hashes due to different hashing algorithms.

@yoavweiss
Copy link
Contributor

@annevk - from does resource match metadataList it seems like only the strongest hash is picked and validated against (so there can only be one ). I believe that if there are multiple "strongest" hashes, the first or last one (based on the UA specific priority function) will get picked.

@annevk
Copy link
Member

annevk commented Apr 2, 2015

Right and if we ever wanted to support content negotiation you'd first filter on type and then on "strongest".

@joelweinberger
Copy link
Contributor

I just wrote up that pull request to add a note about content negotiation.

@igrigorik
Copy link
Member

Wait, is there a reason why we're restricting this discussion to content-type negotiation? "One resource, many representations" is a much broader concept. For example, the server may return different representations of the same resource based on any number of advertised request headers... e.g. CH.

To make this concrete, say I have an HTML resource which will vary based on advertised DPR, RW, and possibly MD request headers, how do I use SRI (or not?) for this asset?

@devd
Copy link
Contributor

devd commented Apr 3, 2015

exactly. I think we should just change the algorithm to keep a list of hashes (still insisting on strongest hash function).

@joelweinberger
Copy link
Contributor

To clarify, @devd, you're suggesting we keep a list of hashes, but only keep around the strongest hash algorithm? That is, if we have:
weakhash;deadbeef, stronghash;beefdead
weakhash;feebdaed, stronghash;daedfeeb

We'd keep around both lists, but only
stronghash;beefdead
stronghash;daedfeeb

That would sound reasonable to me, although it is technically independent of whether we should have separate option-expessions for each (but hey, why not if we're at that point?).

@devd
Copy link
Contributor

devd commented Apr 4, 2015

yup, and I think the change is simple enough that we can even push it into v1. wdyt?

@fmarier
Copy link
Member

fmarier commented Apr 5, 2015

Implementation-wise, it's not a huge change, but it's not a trivial one either. Considering the fact that v1 only has support for styles and scripts, are we going to run into problems if there is only a single valid hash value? If not, I'd suggest ensuring forward-compatibility in v1 and moving "list of hashes" to v2.

Regarding the way that we currently specify content-type via a single option. Should we revert back to tying the content-type to each hash value (e.g. sha256-deadbeef-text/css)? It sounds like that's what @mikewest is suggesting.

@annevk
Copy link
Member

annevk commented Apr 5, 2015

  1. I don't think this is needed for v1, just make sure parsing is forward compatible. 2) @fmarier, as @igrigorik pointed out MIME types is not the only bit to negotiate on. Or are we specifying the MIME type for another reason?

@fmarier
Copy link
Member

fmarier commented Apr 5, 2015

@annevk: the MIME type is currently used to enforce that the resource is served with that Content-Type response header.

We have also talked about extending that enforcement to disable MIME type sniffing and ensure that the browser is parsing a script as a script for example.

@annevk
Copy link
Member

annevk commented Apr 5, 2015

  1. Why does that need client-side enforcement? What does that protect against? (Or in other words, why not useX-Content-Type-Options: nosniff?) 2) Why does that need to be strongly coupled with SRI? Isn't it better as orthogonal feature?

@mikewest
Copy link
Member Author

mikewest commented Apr 5, 2015

IMO, we need the type only for content-based caching schemes. Sniffing protection is nice to have, but @annevk is right that that use case is orthogonal.

If you want to punt the whole caching question, you don't need the type. Just leave room for it, please.

@annevk
Copy link
Member

annevk commented Apr 5, 2015

Pointer to content-based caching schemes and why they need a MIME type?

@devd
Copy link
Contributor

devd commented Apr 5, 2015

@annevk fwiw, I agree with you. But this has a bunch of history and after a lot of discussion the consensus was to keep the mime-type optional because a few folks wanted it and I for one was opposed to it. See #176 and #156

@annevk
Copy link
Member

annevk commented Apr 5, 2015

@devd so if it is optional we do not need it for v1 and just need to make sure we can add it at some point. (Perhaps as a distinct feature and for hash-based caching you'd need to use both SRI and that new feature.)

@devd
Copy link
Contributor

devd commented Apr 5, 2015

yeah, but I think this issue is asking exactly that: how do we add it at some point

@annevk
Copy link
Member

annevk commented Apr 5, 2015

Well if at some point we decide that enforcing the MIME type of a resource on the client is important we add that feature, make it detectable somehow, and get it implemented.

If it needs to be tied to the SRI hash for some reason we should make the processing model of the list of hashes such that either today's clients ignore the additional instruction or fail on them, depending on what we want going forward.

@devd
Copy link
Contributor

devd commented Apr 5, 2015

so the current algorithm (Section 3.3.3) is intentionally very forgiving: the parsing involves a split on a space and skipping any token that is invalid and any algorithm that is invalid. In the future, we can add new types of tokens and we should be fine per the spec.

@devd devd added this to the SRI-next milestone Apr 5, 2015
@fmarier
Copy link
Member

fmarier commented Apr 5, 2015

It sounds like what we need to do is:

  1. remove the global type option
  2. provide a way to attach options (like MIME type) to each hash

Given that type is the only option that's currently defined and that we ignore unidentified tokens, I think we can leave the whole concept of global options (those that affect all hashes) out of V1.

In terms of providing a way to attach options to each hash, here are two possible ways to do it:

  1. sha256-deadbeef-type:text/css;something:else
  2. ni:///sha-256;deadbeef?ct=text/css&something=else

In the first option, we don't have to specify exactly how the options will be provided (that's a V2 feature) but we should specify that hashes can have options after the second hyphen and that they should be ignored for now.

The second option may look familiar ;) but it's worth mentioning again since we're getting close to re-designing the same thing. Also it would be relevant if we're interested in caching optimizations in V2.

@devd
Copy link
Contributor

devd commented Apr 6, 2015

agreed. I also vote for removing the global type option in v1.

Re the attach options to each hash: in terms of usability for developers, I think the far more common case is that developers will attach all the possible hashes instead of figuring out which hash applies in which scenario and which option.

Think a bit more about this though, I think we should consider adding the "keep list of strongest hashes" as part of the algorithm in v1. Given my assertion that providing multiple hashes will be more popular and simpler than specifying options for each hash, the current algorithm will make this tricky to adopt. The current algorithm means that browsers with v1 will just choose the first (or last) hash to compare with and so specifying multiple hashes will be a non-intuitive dance. (OR we create a new algorithm name for multiple hashes)

and then worrying about how

@annevk
Copy link
Member

annevk commented Apr 6, 2015

@devd your comment got partially dropped. If you allow multiple hashes, you need an algorithm for walking through them. E.g. picking the first strongest hash and using that, or attempting all strongest hashes...

@devd
Copy link
Contributor

devd commented Apr 6, 2015

So, the algorithm in the spec already talks about how to deal with unknown hashes and how to pick strongest hash. As far as I can tell, we only need to modify it to keep a list of the strongest hashes instead of just picking the first (or last) one.

Does that sound good to you (and everyone else on the thread)? Remove the global type option and tweak the algorithm to keep list of hashes; in v.next, we look into adding options per hash.

@annevk
Copy link
Member

annevk commented Apr 6, 2015

I'm still missing something. Are you suggesting to keep a list and then see if one of them matches the resource (in some order)? That is fine with me.

@devd
Copy link
Contributor

devd commented Apr 6, 2015

So, if you have

sha256-I/Lt/z7ekCWanjD0Cvj5EqXls2lOaThEA0H2Bg4BT/o= sha256-/WoiWRyIOVNa9ihaBciRSC7XHjliYS9VwUGOIud4PB18= sha256-/r/mIkG3eEpVdm+u/ko/cwxzOMo1bk4TyHIlByibiA5E= sha256-//qK31kX7pz11PB7Jp4cMQOH3sMVh6Se5hb9xGGbjbyI= sha256/HqPF5D7WbC2imDpCpKebHpBnhs6fG1hiFBmgBGOofTg=

the current algorithm will just pick the first hash or the last hash. Then, it will do a hash of the contents and if it matches, succeed or fail. Instead, we want the algorithm to keep them all around, hash the content and see if any match.

sound good?

@annevk
Copy link
Member

annevk commented Apr 6, 2015

Yes, that allows for content negotiation of any kind. Combined with ignoring invalid tokens I think we have a forward compatibility story as well.

@fmarier
Copy link
Member

fmarier commented Apr 6, 2015

Does that sound good to you (and everyone else on the thread)? Remove the global type option
and tweak the algorithm to keep list of hashes; in v.next, we look into adding options per hash.

I haven't heard anybody wanting to keep the global options, so I think we can assume consensus here. With respect to the list of hashes and per-hash options, I think we also have consensus that we don't need either in V1 (given we only support scripts and styles) but that we need to ensure forward-compatibility.

In terms of forward compatibility, my assumptions are that:

  1. we won't need content negotiation for scripts and styles (the use case for list of hashes)
  2. authors may add MIME types to the script/style hashes in the future (and so old clients need to treat those as valid too)

Are there assumptions correct?

@devd
Copy link
Contributor

devd commented Apr 6, 2015

Re point 2:
The current v1 draft is forward compatible only if whatever authors end up doing is ignored by v1 implementations. In particular, imagine a scenario in v.next where there is some sort of negotiation for scripts and styles. It is far more likely (or more usable) that authors will just add the hashes for the different mime types (instead of specifying a hash for each mime type). This will not be compatible with v1, since v1 implementations will just pick one hash. That's why I think we should add "keep a list" in v1.

@joelweinberger
Copy link
Contributor

Sorry for my delay in responding (Passover got in the way).

Basically, I'm pretty neutral what we do for v1. Implementation-wise, none are a huge burden for Chrome, so we'll do whatever. I guess I have a slight preference for "multiple hashes, figure out options later" just because it seems like there are fewer edge cases to work out. In either was, as @devd said, we should have "keep a list" in v1 though.

@joelweinberger
Copy link
Contributor

Since I've already got that pull request going, I'll start writing up a new one that:

  • Takes away global type options
  • Keeps the list around of all "strongest hash algorithms."

Does that make sense?

@devd
Copy link
Contributor

devd commented Apr 7, 2015 via email

@fmarier
Copy link
Member

fmarier commented Apr 7, 2015

  • Takes away global type options
  • Keeps the list around of all "strongest hash algorithms."

Are we positive we're not going to need per-hash options? (e.g. per-hash MIME type)

Because we're not likely to be able to add them later unless we specify a separator of some kind in v1.

@joelweinberger
Copy link
Contributor

I don't think it's accurate that we will not be able to add them later (I certainly think we will, in fact). As @devd wrote above,
"so the current algorithm (Section 3.3.3) is intentionally very forgiving: the parsing involves a split on a space and skipping any token that is invalid and any algorithm that is invalid. In the future, we can add new types of tokens and we should be fine per the spec."

The spec leaves us with plenty of room to separate a new "type option" prefix or sufficx from the hash algorithm and hash value. For example, as a strawman, starting with "?text/javascript:" as a way of expressing type options would be a valid extension of the current syntax.

@fmarier
Copy link
Member

fmarier commented Apr 7, 2015

Yeah, I'm not worried about global options, they are easy to add back later if we need them, given that we just ignore unrecognized tokens.

I guess I was thinking that if we ever want to go from:

integrity="sha256-deadbeef"

to your strawman:

integrity="sha256-deadbeef?text/css"

and have it work (w/o MIME type checking) in old implementations then we need to ensure that the second one is valid in v1 too (i.e. adding it to the ABNF productions).

@joelweinberger
Copy link
Contributor

Fair point. It seems straightforward to add them into the grammar now. How about I rewrite integrity-metadata to:

integrity-metadata = _WSP hash-expression [ ":" option-expression ] *(1_WSP hash-expression [":" option-expression] *WSP / *WSP

which should allow expressions of the type:
javascript:sha256-deadbeef:text/javascript

but we can separately debate whether to actually content-type separately?

@fmarier
Copy link
Member

fmarier commented Apr 8, 2015

@metromoxie That seems fine to me. I'm not sure we're going to need both a prefix and a suffix.

So perhaps all we need to add are the productions to get <options>:sha256-deadbeef to work and then we can leave <options> to be defined later, though it should obviously not contain whitespace or colons.

@devd
Copy link
Contributor

devd commented Apr 8, 2015

hmm .. won't

integrity="sha-256-deadbeef?text/css"

get ignored by a UA and thus not get enforced?

@joelweinberger
Copy link
Contributor

I believe that would technically be a parse failure. "?" is not in the grammar, and hash-expression would therefore fail parsing. If you prefer the "?" syntax, I'm happy to make that the separator for type as we had before.

@joelweinberger
Copy link
Contributor

I just updated my pull request (#236) to add per-hash options and remove MIME types. Also, I decided I like "?" postfix syntax, so I rewrote the grammar to use it.

@devd
Copy link
Contributor

devd commented Apr 8, 2015

hmm .. my reading of the spec is that it splits the integrity metadata on spaces and then if it isn't "valid metadata" then that token is skipped. Whats the difference between valid metadata and parse failure?

@joelweinberger
Copy link
Contributor

Well, that would mean that the "sha256-deadbeef?text/css" token is skippend, and I believe the idea is we want that to parse in v1, but only be meaningful in v2. I suppose that doesn't matter as much now that we will support multiple hashes (so you could have both "sha256-deadbeef?text/css" and "sha256-deadbeef"), but it seems a little silly to me to require the same hash twice.

@devd
Copy link
Contributor

devd commented Apr 8, 2015

ok .. that's fine if we want to do that extra work (we did that for CSP uris). In general, I wanted to make sure that any new style token we introduced won't cause old browsers to break. I am fine with skipping for that reasons: "you are protected if you use a recent browser" is a fine thing for any webapp to say.

@joelweinberger
Copy link
Contributor

I think we're in agreement then, but if you can review my pull request and merge it if it looks good, I'll take that as the ultimate agreement :-)

@devd
Copy link
Contributor

devd commented Apr 8, 2015

heh .. yeah. I am more concerned that the current spec is future compatible so wanted to make sure that current implementations aren't breaking down when they see something they don't know.

mikewest pushed a commit to mikewest/webappsec that referenced this issue Jun 29, 2015
Doc: link to public URL for biblio.ref
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants