-
Notifications
You must be signed in to change notification settings - Fork 36
Privacy concerns regarding Reporting API #169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the feedback, inline..
Since we already have a conversation on this in #158, let's continue this particular discussion there.
Good questions. We don't currently state anything about extension APIs and the challenge here is that there is no — afaik — shared standard or spec for webExtension APIs across different browsers. That said, Reporting relies on Fetch for delivery and my intuition is that if someone was to attempt defining what such an API should be able to see, it ought to be integrated and done at Fetch layer. For reporting requests in particular we set destination=reporting, so one can detect them as such within Fetch.
I don't follow, how does Origin Policy and the limiting to eTLD+1 endpoints relate to each other?
You can see all the details here: https://w3c.github.io/reporting/#try-delivery. Re, cookies: see #161. |
This is all very helpful, thank you @igrigorik . Another key diff here is that since messages are sent in POST, most blocking tools (especially post manifest v3) will loose the ability to reason about these. What about a
Apologies, thats me being a dummy. I didn't mean origin policy, I meant something like https://mikewest.github.io/first-party-sets/ . No excuses on my end for the error, I don't know what I was thinking :) |
I don't know if Fetch would be thrilled to introduce the concept of "sub-destinations". Regarding availability of destination to extensions, see above :)
Can you describe the information leak scenario you're concerned with in a bit more detail? |
I see in the above that "its not specified anywhere" currently, which isn't ideal, but at the very least, it would be useful to know what the plans are for major implementors, to reason through the privacy implications of Reporting API vs
Since Reporting API would give privacy / blocking tools less info to work with (or at least that seems like a possibility), and potentially open up new types of privacy sensitive info (e.g. nel) it would be ideal to balance this loss by restricting / preventing cross domain communication. E.g. prevent (or at least slow) the creation of a 3p "track users through network information" service, or the translation of existing tracking services to Reporting API endpoints. (I dont know if i've answered your question, happy to take another pass at if if not…) |
Ah, hmm.. worth exploring. Off the top of my head, a few caveats:
|
That doc seems to have moved; it's at https://github.com/krgovind/first-party-sets now. It would be difficult for a reporting service provider to use that, though. It doesn't seem to be possible for an origin to be in more than one set, for one. The provider could try to own a single set containing all of its clients, but there is also a limit suggested of 20-30 origins per set, and it would also tie each of the clients to the others as part of the same set. CNAMEs would allow a provider to get around that (either at the provider or client DNS), but as @igrigorik says, I don't know if we want to encourage (or require) that as a way around this. |
Reporting uploads are subject to CORS checks (comparing the origin of the report and the collector), so the user agent will have to send out a preflight request for 3p report uploads before POSTing the actual report content. Does that give us what we need for this? |
I think that CORS is sufficient for (3), and I hope mitigates (4) -- with CORS indicating cooperation between the site and the endpoint. Tying it to domain name will just force people into CNAME tricks, without improving anything substantially. For (1), I raised a proposal at TPAC to tie reports and reporting configuration for Crash, CSP, Deprecation, Feature Policy and Intervention reports to document lifetime. Network Error Logging and similar out-of-document reports have different requirements, but the plan is to start a separate document to standardize their behaviour. I think @igrigorik and @yoavweiss covered (2) and (5) |
Re (1), that sounds great. Tying to document life time would be a strong improvement Re (3) and (4), I don't think CORS addresses the concern, which is generally about restricting where this information can travel to a set of parties the user can reason about (since all this functionality involves the site riding on the user, to help the site owner achieve the site-owners' goals and responsibilities). We discussed briefly at TPAC PING the idea that sites would need to explicitly request the user's permission to use reporting API (e.g. "would you like to submit diagnostics info to {sites,urls} X,Y and Z to help improve the site?"). That would of course mitigate this concern. |
Any particular reason why reporting requests are different from other cross-origin requests when it comes to user expectations? Are there implementations that are considering gating cross-origin requests behind a permission? |
I would also like to learn more about the threat model, in particular how it pertains to reporting that is scoped to document(s). |
So its not a "threat model" question, it's whether its appropriate to for the site to treat the user as the site's debugging agent w/o the user's consent, and especially when some of that debugging activity may be harmful to user interest / privacy (re: intervention, nel, etc) |
Thanks, that's helpful. |
I like how this is stated. If I look at the deprecation report, that tells the site that it's using a feature that my browser will soon no longer support. That feels very much in the spirit of user agency, and something I'd want sites to know about. Conversely, the intervention report does not feel like it advocates for anything on the user's behalf. The example given in the spec, reporting that a request to play audio was blocked due to a lack of user activation, feels like my browser just betrayed me to the site. Will we see sites abuse the Reporting API and put up a paywall (or other annoyance) if I don't let audio or video autoplay? Does a crash report advocate for the user? Maybe? It seems more important for the browser vendor to know about this, but I suppose a site that sees a spike in crash reports after an update (for example) might be prompted to look into it and/or fix things more quickly. |
Re deprecation reports, since we're in a world where there are 3 popular browser runtimes, site authors could just as easily get this information themselves though, no? Or via linting or bots or etc etc etc. Reporting API / additional network behavior seems like an unnecessarily chatty way of fixing that problem. |
I think this discussion is conflating/merging a few different questions:
Regarding (1), I think the answer is unequivocal "no". If we (as we plan) want to use the API for e.g. CSP violation reports that are also available from JS, placing any restrictions on reporting API generated requests will just cause folks to gather that info in JS and use Fetch to send it to whatever report collector they may choose. That's not something we necessarily want to encourage, and above all, it makes no sense to go down that route. Regarding (2), I think we can discuss if access to each piece of information (through both Reporting API requests and Regarding (3), I think it would be interesting to have a broader discussion on that point, including PING, TAG, relevant WGs and browser vendors. |
|
Perhaps "Report Types" is best split from this document after all as these are quite novel reporting use cases that each deserve their own scrutiny. Of those, Mozilla is interested in crash reporting, but we would perhaps require user consent in the crashed screen similar to how we require that for reporting such information to Mozilla today. |
The immediate case for the API's usefulness is CSP reporting. Feature Policy reporting also seems highly valuable and will relies on the API (but it is to some extent "new information", if you consider Feature Policy to be new).
Probably. I think that in order to avoid confusion, it might be best to split those out to a separate proposal that relies on this one. Then we can discuss privacy implications vs. utility for each one of those data types, while making it clear that the Reporting API infrastructure is indeed something that we can move forward with. @clelland, @igrigorik - thoughts?
Again, I think it makes sense to discuss these questions separately for crashes, deprecations and interventions, as I suspect the answer may vary widely between them.
Can I take that as you withdrawing your proposal for gating the Reporting infrastructure behind a permission until that higher level question is further discussed? @cwilso what would be the best venue for such a discussion that spans the PING, TAG, WG chairs and WG members? We'd like to determine if it makes sense for specifications to include specific mitigations when it comes to security and privacy, and if so, how those mitigations are decided upon. |
Yes, splitting those out might make sense, leaving Reporting to just define the infrastructure. What's the right place for deprecation/crash/intervention reports? HTML? Another document in this repo? Somewhere else entirely? |
There was some previous discussion about this in #80 and #60 (comment). At the time, we decided to include everything in a single spec to "keep overhead low", though tbh I've always been more of a fan of the modularity that we'd get from separating them out. |
Three separate incubation repositories makes the most sense to me at this point given the levels of interest expressed and there not being much overlap in the topics. I could see crashing ending up being defined by HTML at some point (seeing how it manages agent cluster allocation). |
What would be the privacy rationale for requiring user consent for merely reporting that a tab crashed, and that the type of the crash was an "oom"? Mozilla's browser crash reporting gathers stacks and other low-level user information which are very privacy-sensitive, very much unlike the 2 bits of Reporting API info. What is the threat model to a site being able to measure changes in crash rates reliably? |
Setting aside the possible privacy harms from learning about hardware capabilities, that this could be used to circumvent all sorts of FP surface reduction (e.g. if UA gets frozen, stuff like this might be able to pull back out UA version info, etc)… What frequent, user-directed goal does this information serve that couldn't be gathered in more privacy respecting manner? If sites are worried they're going to be causing OOM errors on their users, that seems like something site owners could easily detect with automated testing, asking users to opt in info, or just dogfooding
Sure, but Mozilla doesn't send it to arbitrary 3rd parties! Theres not a parallel between the trust a user puts in a piece of software they download, install and use every day, and a website they visited where they're information went who-knows-where (from the user's perspective) |
Detecting memory-use regressions on a site that has an infinite number of user data and code variations. We are running N simultaneous A/B experiments where N is very large, with newsfeeds and chat messages that are unique to each user. Lab testing cannot catch these regressions. This isn't a theoretical, I've been dealing with hard-to-debug Facebook.com memory leaks for a year now. I also wrote the memory use lab tests, they are not enough. In fact, this is why Mozilla and Google collect crash dumps from the field instead of just relying on lab testing. And I still don't see how reporting that a tab OOMed is not privacy respecting.
Mozilla collects crash information from 3rd party sites (e.g. banking) and sends it to themselves. |
If you'd like to recruit users to help you debug theoretical, rare cases (the vast vast vast majority of websites don't have to worry about OOM, of course), then surely the polite, privacy respecting thing to do is to ask them. i.e. permissions
Again, users trust mozilla, they don't trust arbitrary websites. unless something is very different my understanding, Mozilla isn't sending crash information to third parties (is this incorrect?) |
I worked on Telemetry at Mozilla and occasionally touched crash reporting, and I can tell you OOMs in sites are common. In fact OOMs were the top reason for crashes for a long time. |
I would also add that memory leaks are in fact extremely common in JavaScript and this is why major browsers ship some great developer tools for debugging them. None of this is theoretical. |
i'll stop now since it seems like there is rough agreement to split the types of reports from the reporting API discussion in general, and this might all be moot anyway :) |
I think it might be better to take this discussion to the related crash/oom report repo, once one is created. I have many opinions on that front, but I'll save them for later :) |
FYI -- the new repos are up now -- |
Awesome work Ian, thanks for splitting these up! On a quick pass..
|
@clelland - Can we maybe open separate issues for the remaining work from #169 (comment) (if we haven't already) and close this one? |
I'll open new issues in the the three related specs about processing, and one here to ensure that the requirement to declare observability is a MUST for any specs integrating. |
The other issues raised here, namely:
can be covered by specific issues. If there are others, let me know, and we can open additional issues for discussion. |
Before this is closed, there are privacy aspects of the reporting API that are independent of the types of reports. If these have been dealt with elsewhere im happy to have this issue closed out, but i just want to make sure they're not lost:
|
|
Re: Lifetime: with this spec, for reports generated as part of documents, the intention is that the report lifetime is tied to the document lifetime. Endpoint configuration data is tied strongly to the document, it is not persisted past that, or used for any other documents. The reports themselves should be sent reasonably quickly, and I believe that @annevk suggested that the timing should match that of fetch.keepalive at the latest. Reports can be sent to anywhere that follows CORS protocols -- same-origin has no issues, same-site or further requires CORS opt-in by the receiving party, which should match the behaviour of subresource requests, or any other way that this data could be exfiltrated if reporting was unavailable. |
(Thanks, @annevk -- that was way more concise than me :) ) |
@annevk Re document lifetime, is this no longer correct then? Or am i reading it incorrectly? It seems to say reports can be sent from service workers too, and so not document length. Apologies if I'm misunderstanding here. re "anywhere", this is still very surprising to me. All the motivations above have been "it'd be useful for facebook to know if users are out of memory on facebook" or "it'd be useful for facebook to know that some feature facebook uses may not be supported in the future, etc". Bracketing whether its good / bad / whatever to send the information to facebook, it seems even less compelling to say "it'd be useful for facebook to know if users are out of memory, so im sending the data to someone who's not facebook". Anyway, all that is to say, I understand they're currently defined in terms of |
Allowing reports to be sent to third parties enables several things:
|
Just to add to @clelland point above, we host our reporting endpoint on a third party domain since it runs on GCP Functions so "third party" isn't always a clear-cut distinction. We use GCP Functions for several reasons but one of them is that it means that even if all of our DNS is broken, we can still receive reports (unless Google's DNS is concurrently broken). |
@pes10k service worker lifetime is generally bound to the document. In Firefox that is not true for push notifications at the moment, but we are considering changing that. Letting scripts run in the background was a mistake and we don't plan to expand on that. I don't really see what a same origin/site restriction would buy you here. The information that can be collected should generally be information that a site can already collect and share with whoever using |
thanks @annevk , I definitely agree to with the above! If the intent is to tie these to document length though (which seems good to me, since it allows users to "pull the plug" if they don't trust the site), would yall be open to just changing things to make them explicitly tied to the document length? re same site vs everywhere (cc @neilstuartcraig @annevk @clelland ) thank you for the details here. I still think there is valid reason to distinguish where this data can be sent, since its primarily focused on helping the site and not the user. But I appreciate the points you've made, and of the concerns I've mentioned in this issue and others, its the one i feel least strongly about, so im happy to drop it. |
Well, the service worker can do its own reporting, as can other workers. It makes sense for that to be bound to their lifetime as they can close sooner than the document. And in case of shared workers they could outlive a document as long as the documents they are bound to are same origin. I think in case of lifetime concerns it's better to address the root cause. It seems unlikely implementations will get the infrastructure right otherwise. And it's unlikely to help, as those workers could just do their own fetches. |
I think this looks very promising, and am grateful for ya'll putting it together! I see a couple of privacy concerning issues though, that I'd like to work through / address:
The text was updated successfully, but these errors were encountered: