Skip to content

Should screen capture tracks expose deviceId? #308

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
youennf opened this issue Nov 1, 2024 · 16 comments · Fixed by #310
Closed

Should screen capture tracks expose deviceId? #308

youennf opened this issue Nov 1, 2024 · 16 comments · Fixed by #310

Comments

@youennf
Copy link
Collaborator

youennf commented Nov 1, 2024

As seen in #306, Chrome and Safari are exposing deviceId on getDisplayMedia tracks. Firefox is not.

There are a couple of use cases that could make it useful for exposing deviceId:

  • On successive getDisplayMedia calls within a same video call, web applications can know whether the same surface was selected or not.
  • On surface switching, exposing deviceId can be used to let know configurationchange listeners that a surface switch happens. This could be useful for instance with MSTP which lives in a worker.
  • It may also allow on surface switching that the same surface was already captured in the past.
@youennf
Copy link
Collaborator Author

youennf commented Nov 4, 2024

@eladalon1983 mentioned how deviceId could be defined and what should be their lifetime.
One approach would be something like:

  • In terms of definition, these ids could be based on identifiers provided by the OS for screens and windows. The UA would break responsible to provide identifier for tabs.
  • These ids would be hashed like done for cameras and microphones, which would make them stable across capture sessions.

The screen capture spec could try to define tab capture in terms of top level traversable.

@eladalon1983
Copy link
Member

On the one hand, yes, this seems potentially useful.

On the other hand, realistically, we cannot expect all user agents to prioritize this work in the short-term. I'd be more comfortable allowing this behavior than mandating it.

I'd also suggest scoping the persistence of the ID to successive getDisplayMedia() calls by the same Document. That is:

  • If the user reloads the the video conferencing application, the capturing app should get new IDs. Otherwise, we'd be exposing which tabs/windows are still there, which needlessly goes beyond the current MVP.
  • If the user has tab1 dialed to origin1 and tab2 dialed to origin2, and both embed vc-in-iframe.com that calls gDM, then we should expose separate yet persistent IDs for similar reasons.

Wdyt?

@youennf
Copy link
Collaborator Author

youennf commented Nov 4, 2024

I'd be more comfortable allowing this behavior than mandating it.

I would go with mandating the behaviour, otherwise there is no real point for a spec.
I think it is ok for spec to go first if we are confident implementations follow.

  • If the user reloads the the video conferencing application, the capturing app should get new IDs. Otherwise, we'd be exposing which tabs/windows are still there, which needlessly goes beyond the current MVP.

What are we trying to protect?
Thinking about it more, given groupId is not a stable ID, I think it is fine to do the same for display tracks.

  • If the user has tab1 dialed to origin1 and tab2 dialed to origin2, and both embed vc-in-iframe.com that calls gDM, then we should expose separate yet persistent IDs for similar reasons.

Device IDs should be hashed per origins yes, just like cameras and microphones.
Or hashed per document, like groupId.

@youennf
Copy link
Collaborator Author

youennf commented Nov 4, 2024

As per media capture main spec, The group identifier MUST be uniquely generated for each [document](https://dom.spec.whatwg.org/#concept-document).. We would reuse that here.

@eladalon1983
Copy link
Member

I would go with mandating the behaviour, otherwise there is no real point for a spec.

The authors of RFC 2119 seem to have had a different opinion, and that appears to be the industry standard atm. Specifically, I am thinking we could use "SHOULD" here.

If the user reloads the the video conferencing application, the capturing app should get new IDs. Otherwise, we'd be exposing which tabs/windows are still there, which needlessly goes beyond the current MVP.

What are we trying to protect?

Thinking out loud:

  • If I am in a video call and I share the same tab/window twice in a row during the call, it might be reasonable that the Web application knows it's the same tab.
  • If I share the same tab/window the next day, this becomes a bit less expected.
  • The more days pass - the more unexpected it becomes.

Is it a problem? I am not sure. But do we want to engage a lengthy privacy review of something no Web developer has asked for yet? I worry it might not be good use of anyone's time. If we start small and it's proven useful, we could consider extensions later.

Device IDs should be hashed per origins yes, just like cameras and microphones.

Great, we're in agremenet here. And likely this already yields the result I was after for the previous point?

@youennf
Copy link
Collaborator Author

youennf commented Nov 9, 2024

Yes, I think starting with per-document IDs is ok with me.

@jan-ivar
Copy link
Member

jan-ivar commented Nov 13, 2024

I don't find the use cases compelling.

  • On successive getDisplayMedia calls within a same video call, web applications can know whether the same surface was selected or not.

This seems to invite applications to make mistaken assumptions.

Same tab ≠ same document.
Different tab ≠ different document.

  • On surface switching, exposing deviceId can be used to let know configurationchange listeners that a surface switch happens. This could be useful for instance with MSTP which lives in a worker.

Other alternatives in w3c/mediacapture-screen-share-extensions#15 seem simpler than overloading configurationchange

  • It may also allow on surface switching that the same surface was already captured in the past.

Again, any assumption of what "same" means here seems unsound. Tabs may have navigated.

@eladalon1983
Copy link
Member

I don't find the use cases compelling.

Generally, I also don't have use cases in mind at the moment, but I'd love for us to phrase the spec in a way that would not constrain Safari from offering functionality they believe in. Hence my suggestion of SHOULD.

I do have one compelling, semi-adjacent use case, though - that of knowing when the app is capturing the current tab. At the moment, this is possible, but requires non-trivial work; see explanation for the explanation and demo. Possibly we could specify that self-capture (the app capturing its own tab) is exposed via a unique ID that's pre-specified?

This seems to invite applications to make mistaken assumptions.

That's a valid concern, but I am not sure if it's significant one - Web developers are competent enough to either use the information correctly, or to knowingly choose to employ it as a heuristic, despite its failures. (I find as more convincing the question of whether any Web developer needs this.)

@jan-ivar
Copy link
Member

deviceId does not enable self-capture detection AFAIK, so that seems like a separate issue.

I'd love for us to phrase the spec in a way that would not constrain Safari from offering functionality they believe in. Hence my suggestion of SHOULD.

I agree with @youennf there is no real point in a spec like that.

Web developers are competent enough to either use the information correctly, or to knowingly choose to employ it as a heuristic, despite its failures.

Some web developers are, others absolutely not.

My point here is that there are no use cases for detecting the same navigable. It is the wrong solution to a problem.

@eladalon1983
Copy link
Member

So there is no consensus on the general case. What about the following specific case?

Possibly we could specify that self-capture (the app capturing its own tab) is exposed via a unique ID that's pre-specified?

We could say that deviceId has to be "current-browser-surface" or "current-navigable" or "this" or something like that in that case.

@jan-ivar
Copy link
Member

We've agreed that self-capture as a use case deserves its own API.

Self-capture as a mistake seems better solved in browser UX.

Self-capture today carries unique risks, so I'm not in favor of making it (more) detectable.

@youennf
Copy link
Collaborator Author

youennf commented Nov 19, 2024

My point here is that there are no use cases for detecting the same navigable. It is the wrong solution to a problem.

We are not talking same navigable but same surface.
There are use case as described above, one more below.
More generally, given native applications have this information and make use of it, I do not see why it would not be useful for web pages too.

Please also note that surface IDs information already surface in track.label for all browsers.

Other alternatives in w3c/mediacapture-screen-share-extensions#15 seem simpler than overloading configurationchange

Here is a use case that is better served by configurationchange.
Let's say I am doing a recording of the screen with microphone. Recording is done via MSTP in a worker.
The web application wants to chapterize the recording to easily navigate to specific parts of the recording.

Changing of surface is a good event as input for chapterization.
Getting the ID of the surface is nice as it can help user skipping to the next time user is sharing the same surface (say user is switching between two different screens, or two windows...).
Surface type is a partial solution, surface ID would be good.

Also, the web page will want to know when the first video frame of the new surface will happen, to encode it as a key frame and to get its precise timestamp.

This issue is not solvable with the callback approach (which has the disadvantage of pausing video frames which we do not want here).

One way to solve this is to synchronise settings/configurationchange with the enqueuing of VideoFrames in the worker (same task queue basically).
We really need to decide what we do about this synchronisation (ditto with applyConstraints promise resolution).

Another solution is to add metadata to VideoFrames.
This could end up meaning adding all video track settings to VideoFrame metadata, this can be quite a hammer.

I'll file an issue about this.

@jan-ivar
Copy link
Member

We are not talking same navigable but same surface.

In the case of tab capture, the surface is (the rendered form of) a navigable.

Let's say I am doing a recording of the screen with microphone. Recording is done via MSTP in a worker.
The web application wants to chapterize the recording to easily navigate to specific parts of the recording.

Changing of surface is a good event as input for chapterization.
Getting the ID of the surface is nice as it can help user skipping to the next time user is sharing the same surface (say user is switching between two different screens, or two windows...).

I'm not sure how realistic that use case is. It seems to make a lot of assumptions, like the user having several screens or windows of reasonably similar dimensions (to produce a recording with a consistent resolution).

Maybe with tab capture and different tabs of the same window which would all having the same resolution. But even then, it's not clear what should constitutes a "chapter":

  • if I source switch to a document in a different tab I get a chapter
  • if I navigate to a different website within the same tab, I don't get a chapter

If we're serious about solving this use case, a better approach might be to relax track.label to update to reflect the currently loaded document. This would allow for per-origin chapters for instance.

@dontcallmedom-bot
Copy link

This issue had an associated resolution in WebRTC December 2024 meeting – 10 December 2024 ([screen-share] Issue 308: Should screen capture tracks expose deviceId?):

RESOLUTION: Consensus to add deviceId to settings of a track

@youennf
Copy link
Collaborator Author

youennf commented Dec 18, 2024

I had a look at other sources (canvas capture track, webrtc remote track, web audio track).
Chrome is exposing a deviceId for all of them.
Safari and Firefox are not exposing for any of them.

@jan-ivar
Copy link
Member

Please file a bug on chromium.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants