Should screen capture tracks expose deviceId? #308

youennf · 2024-11-01T19:29:21Z

As seen in #306, Chrome and Safari are exposing deviceId on getDisplayMedia tracks. Firefox is not.

There are a couple of use cases that could make it useful for exposing deviceId:

On successive getDisplayMedia calls within a same video call, web applications can know whether the same surface was selected or not.
On surface switching, exposing deviceId can be used to let know configurationchange listeners that a surface switch happens. This could be useful for instance with MSTP which lives in a worker.
It may also allow on surface switching that the same surface was already captured in the past.

The text was updated successfully, but these errors were encountered:

youennf · 2024-11-04T07:30:50Z

@eladalon1983 mentioned how deviceId could be defined and what should be their lifetime.
One approach would be something like:

In terms of definition, these ids could be based on identifiers provided by the OS for screens and windows. The UA would break responsible to provide identifier for tabs.
These ids would be hashed like done for cameras and microphones, which would make them stable across capture sessions.

The screen capture spec could try to define tab capture in terms of top level traversable.

eladalon1983 · 2024-11-04T10:40:01Z

On the one hand, yes, this seems potentially useful.

On the other hand, realistically, we cannot expect all user agents to prioritize this work in the short-term. I'd be more comfortable allowing this behavior than mandating it.

I'd also suggest scoping the persistence of the ID to successive getDisplayMedia() calls by the same Document. That is:

If the user reloads the the video conferencing application, the capturing app should get new IDs. Otherwise, we'd be exposing which tabs/windows are still there, which needlessly goes beyond the current MVP.
If the user has tab1 dialed to origin1 and tab2 dialed to origin2, and both embed vc-in-iframe.com that calls gDM, then we should expose separate yet persistent IDs for similar reasons.

Wdyt?

youennf · 2024-11-04T16:45:09Z

I'd be more comfortable allowing this behavior than mandating it.

I would go with mandating the behaviour, otherwise there is no real point for a spec.
I think it is ok for spec to go first if we are confident implementations follow.

If the user reloads the the video conferencing application, the capturing app should get new IDs. Otherwise, we'd be exposing which tabs/windows are still there, which needlessly goes beyond the current MVP.

What are we trying to protect?
Thinking about it more, given groupId is not a stable ID, I think it is fine to do the same for display tracks.

If the user has tab1 dialed to origin1 and tab2 dialed to origin2, and both embed vc-in-iframe.com that calls gDM, then we should expose separate yet persistent IDs for similar reasons.

Device IDs should be hashed per origins yes, just like cameras and microphones.
Or hashed per document, like groupId.

youennf · 2024-11-04T16:47:33Z

As per media capture main spec, The group identifier MUST be uniquely generated for each [document](https://dom.spec.whatwg.org/#concept-document).. We would reuse that here.

eladalon1983 · 2024-11-05T11:52:15Z

I would go with mandating the behaviour, otherwise there is no real point for a spec.

The authors of RFC 2119 seem to have had a different opinion, and that appears to be the industry standard atm. Specifically, I am thinking we could use "SHOULD" here.

If the user reloads the the video conferencing application, the capturing app should get new IDs. Otherwise, we'd be exposing which tabs/windows are still there, which needlessly goes beyond the current MVP.

What are we trying to protect?

Thinking out loud:

If I am in a video call and I share the same tab/window twice in a row during the call, it might be reasonable that the Web application knows it's the same tab.
If I share the same tab/window the next day, this becomes a bit less expected.
The more days pass - the more unexpected it becomes.

Is it a problem? I am not sure. But do we want to engage a lengthy privacy review of something no Web developer has asked for yet? I worry it might not be good use of anyone's time. If we start small and it's proven useful, we could consider extensions later.

Device IDs should be hashed per origins yes, just like cameras and microphones.

Great, we're in agremenet here. And likely this already yields the result I was after for the previous point?

youennf · 2024-11-09T11:22:48Z

Yes, I think starting with per-document IDs is ok with me.

jan-ivar · 2024-11-13T17:22:52Z

I don't find the use cases compelling.

On successive getDisplayMedia calls within a same video call, web applications can know whether the same surface was selected or not.

This seems to invite applications to make mistaken assumptions.

Same tab ≠ same document.
Different tab ≠ different document.

On surface switching, exposing deviceId can be used to let know configurationchange listeners that a surface switch happens. This could be useful for instance with MSTP which lives in a worker.

Other alternatives in w3c/mediacapture-screen-share-extensions#15 seem simpler than overloading configurationchange

It may also allow on surface switching that the same surface was already captured in the past.

Again, any assumption of what "same" means here seems unsound. Tabs may have navigated.

eladalon1983 · 2024-11-14T10:18:56Z

I don't find the use cases compelling.

Generally, I also don't have use cases in mind at the moment, but I'd love for us to phrase the spec in a way that would not constrain Safari from offering functionality they believe in. Hence my suggestion of SHOULD.

I do have one compelling, semi-adjacent use case, though - that of knowing when the app is capturing the current tab. At the moment, this is possible, but requires non-trivial work; see explanation for the explanation and demo. Possibly we could specify that self-capture (the app capturing its own tab) is exposed via a unique ID that's pre-specified?

This seems to invite applications to make mistaken assumptions.

That's a valid concern, but I am not sure if it's significant one - Web developers are competent enough to either use the information correctly, or to knowingly choose to employ it as a heuristic, despite its failures. (I find as more convincing the question of whether any Web developer needs this.)

jan-ivar · 2024-11-14T13:48:08Z

deviceId does not enable self-capture detection AFAIK, so that seems like a separate issue.

I'd love for us to phrase the spec in a way that would not constrain Safari from offering functionality they believe in. Hence my suggestion of SHOULD.

I agree with @youennf there is no real point in a spec like that.

Web developers are competent enough to either use the information correctly, or to knowingly choose to employ it as a heuristic, despite its failures.

Some web developers are, others absolutely not.

My point here is that there are no use cases for detecting the same navigable. It is the wrong solution to a problem.

eladalon1983 · 2024-11-14T13:50:02Z

So there is no consensus on the general case. What about the following specific case?

Possibly we could specify that self-capture (the app capturing its own tab) is exposed via a unique ID that's pre-specified?

We could say that deviceId has to be "current-browser-surface" or "current-navigable" or "this" or something like that in that case.

jan-ivar · 2024-11-14T14:01:59Z

We've agreed that self-capture as a use case deserves its own API.

Self-capture as a mistake seems better solved in browser UX.

Self-capture today carries unique risks, so I'm not in favor of making it (more) detectable.

youennf · 2024-11-19T10:55:40Z

My point here is that there are no use cases for detecting the same navigable. It is the wrong solution to a problem.

We are not talking same navigable but same surface.
There are use case as described above, one more below.
More generally, given native applications have this information and make use of it, I do not see why it would not be useful for web pages too.

Please also note that surface IDs information already surface in track.label for all browsers.

Other alternatives in w3c/mediacapture-screen-share-extensions#15 seem simpler than overloading configurationchange

Here is a use case that is better served by configurationchange.
Let's say I am doing a recording of the screen with microphone. Recording is done via MSTP in a worker.
The web application wants to chapterize the recording to easily navigate to specific parts of the recording.

Changing of surface is a good event as input for chapterization.
Getting the ID of the surface is nice as it can help user skipping to the next time user is sharing the same surface (say user is switching between two different screens, or two windows...).
Surface type is a partial solution, surface ID would be good.

Also, the web page will want to know when the first video frame of the new surface will happen, to encode it as a key frame and to get its precise timestamp.

This issue is not solvable with the callback approach (which has the disadvantage of pausing video frames which we do not want here).

One way to solve this is to synchronise settings/configurationchange with the enqueuing of VideoFrames in the worker (same task queue basically).
We really need to decide what we do about this synchronisation (ditto with applyConstraints promise resolution).

Another solution is to add metadata to VideoFrames.
This could end up meaning adding all video track settings to VideoFrame metadata, this can be quite a hammer.

I'll file an issue about this.

jan-ivar · 2024-12-10T00:54:34Z

We are not talking same navigable but same surface.

In the case of tab capture, the surface is (the rendered form of) a navigable.

Let's say I am doing a recording of the screen with microphone. Recording is done via MSTP in a worker.
The web application wants to chapterize the recording to easily navigate to specific parts of the recording.

Changing of surface is a good event as input for chapterization.
Getting the ID of the surface is nice as it can help user skipping to the next time user is sharing the same surface (say user is switching between two different screens, or two windows...).

I'm not sure how realistic that use case is. It seems to make a lot of assumptions, like the user having several screens or windows of reasonably similar dimensions (to produce a recording with a consistent resolution).

Maybe with tab capture and different tabs of the same window which would all having the same resolution. But even then, it's not clear what should constitutes a "chapter":

if I source switch to a document in a different tab I get a chapter
if I navigate to a different website within the same tab, I don't get a chapter

If we're serious about solving this use case, a better approach might be to relax track.label to update to reflect the currently loaded document. This would allow for per-origin chapters for instance.

dontcallmedom-bot · 2024-12-11T07:50:48Z

This issue had an associated resolution in WebRTC December 2024 meeting – 10 December 2024 ([screen-share] Issue 308: Should screen capture tracks expose deviceId?):

RESOLUTION: Consensus to add deviceId to settings of a track

youennf · 2024-12-18T12:11:56Z

I had a look at other sources (canvas capture track, webrtc remote track, web audio track).
Chrome is exposing a deviceId for all of them.
Safari and Firefox are not exposing for any of them.

jan-ivar · 2024-12-18T13:55:57Z

Please file a bug on chromium.

youennf mentioned this issue Nov 1, 2024

Auto-pause for Captured Surface Switching (2nd edition) w3c/mediacapture-screen-share-extensions#15

Open

eladalon1983 mentioned this issue Nov 4, 2024

Spec should be more explicit about exposure of deviceId #306

Closed

youennf mentioned this issue Dec 19, 2024

Add deviceId as settings and constraints for screen share video tracks. #310

Merged

jan-ivar closed this as completed in #310 Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Should screen capture tracks expose deviceId? #308

Should screen capture tracks expose deviceId? #308

youennf commented Nov 1, 2024

youennf commented Nov 4, 2024

Uh oh!

eladalon1983 commented Nov 4, 2024

Uh oh!

youennf commented Nov 4, 2024

Uh oh!

youennf commented Nov 4, 2024

Uh oh!

eladalon1983 commented Nov 5, 2024

Uh oh!

youennf commented Nov 9, 2024

Uh oh!

jan-ivar commented Nov 13, 2024 •

edited

Loading

Uh oh!

eladalon1983 commented Nov 14, 2024

Uh oh!

jan-ivar commented Nov 14, 2024

Uh oh!

eladalon1983 commented Nov 14, 2024

Uh oh!

jan-ivar commented Nov 14, 2024

Uh oh!

youennf commented Nov 19, 2024

Uh oh!

jan-ivar commented Dec 10, 2024

Uh oh!

dontcallmedom-bot commented Dec 11, 2024

Uh oh!

youennf commented Dec 18, 2024 •

edited

Loading

Uh oh!

jan-ivar commented Dec 18, 2024

Uh oh!

Should screen capture tracks expose deviceId? #308

Should screen capture tracks expose deviceId? #308

Comments

youennf commented Nov 1, 2024

youennf commented Nov 4, 2024

Uh oh!

eladalon1983 commented Nov 4, 2024

Uh oh!

youennf commented Nov 4, 2024

Uh oh!

youennf commented Nov 4, 2024

Uh oh!

eladalon1983 commented Nov 5, 2024

Uh oh!

youennf commented Nov 9, 2024

Uh oh!

jan-ivar commented Nov 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eladalon1983 commented Nov 14, 2024

Uh oh!

jan-ivar commented Nov 14, 2024

Uh oh!

eladalon1983 commented Nov 14, 2024

Uh oh!

jan-ivar commented Nov 14, 2024

Uh oh!

youennf commented Nov 19, 2024

Uh oh!

jan-ivar commented Dec 10, 2024

Uh oh!

dontcallmedom-bot commented Dec 11, 2024

Uh oh!

youennf commented Dec 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jan-ivar commented Dec 18, 2024

Uh oh!

jan-ivar commented Nov 13, 2024 •

edited

Loading

youennf commented Dec 18, 2024 •

edited

Loading