Agenda for Feb 3 meeting #54

foolip · 2022-02-02T22:23:46Z

Here's the agenda for our meeting tomorrow:

Clarify / clean-up Interop 2021 labeling #46
Score "Investigate" progress as part of the overall metric #49
- Revisit Interop 2022 positions to determine which are now "Investigate" (positions may have changed after last meeting)
- Confirm that we'll consider new viewport units (interop-2022-viewport) and viewport measurement investigation (Viewport Investigation project #41) separate for scoring.
- How to score? Last week's discussion gravitated towards giving equal weight to other focus areas and the group updating the score between 0 and 100 based on progress. Proposed constraint: 100 only if it's in good enough shape to include in Interop 2023
Update on metrics and dashboard
Reminder to add your org to supporters.md
Confirm still targeting Feb 14 launch

Previous meeting: #50

The text was updated successfully, but these errors were encountered:

jensimmons · 2022-02-03T16:09:11Z

Under "Update on metrics" I'd like to talk about how the math was handled in 2021 scoring and how we can perhaps do better for 2022:
https://docs.google.com/spreadsheets/d/1gwMmCM_qr5Ew7GPOEWNeWxN_jj0U4wMPcoECH5zzJYI/edit?usp=sharing

foolip · 2022-02-03T17:06:08Z

@jensimmons I've added #46 to the top of the agenda, since I think that explains a lot of what's going on with Compat 2021 scoring. If there's more to discuss we can do that of course.

foolip · 2022-02-04T17:33:55Z

Here are the notes I took:

Clarify / clean-up Interop 2021 labeling

Jen: Convertion of test results to score. Everything gets rounded down in the step of converting to an integer 0-20.
Philip: I found the same when looking into #46 today.
Una: Makes sense to have equal weight per area.
Sam: Changing the rounding will mean that the numbers don’t add up to the summary.
Philip: Yes, that can probably happen if we have a data grid explaining the numbers.
Sam: If we have extra digits beyond what we show it’s not a big deal.
James: We seem to be in agreement, we don’t want to introduce truncation error. Maybe it’s not exactly the same at the 2nd decimal point, but nobody is going to complain about that. Don’t deliberately introduce precision loss, seems reasonable.
Jen: I like the idea of trying to stick with percentages as much as possible. If someone is that invested that they notice…
Chris: So conclusion is we won’t round until we sum them up? And when we display, we round off a little at display time.
Philip: To be pedantic, it would be fixed point math at a few levels.
Sam: I think we’re in agreement, but can review when there’s code.
Philip: Note that we can’t get the percentages we’re using from the wpt.fyi results tables, unfortunately.

Score "Investigate" progress as part of the overall metric

Not revisiting the positions spreadsheet, it hasn’t changed.

Jen: Let’s say we make “investigate” a 16th bucket. If scored 0-100%, will that be the same for all browsers? Or somewhat depends on which browser representatives have done more of the work?
James: The proposal is it’s the same score for everyone, based on investigation being a collaborative effort. For interop, it’s not about one browser being better than another. The reason we think this is important is there are already a lot of incentives for implementing features, but much less incentive structure for cleanup work and making sure all browsers are interoperable. Without some visibility for this kind of work, the interop metric doesn’t add as much value as we’d like.
Chris: James, would you be OK with a separate metric along the lines what Jen is suggesting?
James: We haven’t had time to discuss it internally.
Note: #49 (comment) posted before meeting
Sam: To summarize, by including in the main metric, it doesn’t provide the same incentive for all vendors to work towards it. Our preference is to have it as a separate thing as far as 2022 is concerned. But not necessarily an absolute objection to including, but that’s not a definite position.
Jen: Is this 1/16th of the total, a side number…?
James: The issue has a fairly concrete proposal, which was “some percentage of the score” divided between the investigate areas. The proposal was 15%, but can definitely discuss details. But if it’s so little that it can be ignored, it won’t be worth it, it needs to have a meaningful change of affecting people’s incentives.
Chris: To clarify, you think that if it’s not included in the main metric, there will be no incentive to do this work?
James: Value proposition for Interop 2022 for us is the kind of work that affects interop and web compat.
Chris: As a specific proposal, 1/16th of the score?
James: I can suggest it, but it feels low. Compat 2021 was considered a success because everyone got above 90%. That suggests that 4-5% isn’t very much, but 10% is.
Chris: 10% then? Jen?
Jen: I think that scoring as one additional bucket makes more sense than carving out a chunk of the total. I understand the argument, but we don’t agree with it. Originally Mozilla didn’t want to include things that aren’t spec’d, and this is about things that aren’t.
James: We didn’t want to score the implementation of things that aren’t specified.
Tantek: I sympathize with Jen’s observation, but we’re delineating. These investigate items are about hard problems with existing implementations, which is hard and severely under-incentivized.

Philip: If we do agree to investigate and score, are we happy with the 4 in #49?
Sam: We have no objections to any of those.
Jen: Let’s not bikeshed the weighting of the 4 areas. This is all investigation, manual testing, spec work, etc.
Sam: One could argue that Pointer Events and MouseEvent go together.

Philip: If we score the investigation efforts, any constraint on what 100% means?
Sam: I think we can figure out the scoring. The contention is around whether we include it in the main score.
Chris: How about we all see if we agree that we will score these areas, they’ll be in one combined score, and that score will appear on the dashboard in some form. We should be able to agree on that. Agreeing on that would be substantial progress.
Jen: Apple’s position is we would be fine with including investigation in the main score. Investigate bucket will be 10%, the rest will be 90% (and 15*6=90). And each of the 4 items have a group with a person driving it, and that group reports back to “the keeper of the scores”. So we’d be comfortable with 10/90%, not a higher percentage.
Sam: If we have 3 investigate categories, each would be 3.33%, which is about half of the main areas, which seems reasonable.
Jen: I also want to be clear that investigate scores are the same number for every browser, no competition on that.
James: I need to go back and vet this but it seems very close to our proposal. I’m inclined to thing we’ll be happy with this, but I think this sounds good.
Philip: Substantial progress. Let’s revisit next week. I can look at metrics.
Chris: Can you update the RFC to restart the clock on the feedback process?
James: That would be ideal, yes.

Dashboard update

Feb 14 launch date

Jen: That’s a week and a few days away.
James: Agree it’s a bit tight.
Philip: Understood, we’ll get a move on with the dashboard.
Jen: We’ll want to have screenshots and numbers.
Una: Targeting next week is reasonable, maybe next week. Aim to launch a dashboard internally for us very soon.
Jen: Can Mozilla make a decision about “investigate” today/tomorrow so that there’s time to adapt the dashboard?
James: We will ASAP. We will also need at least a week from having the RFC/dashboard ready to being ready to go and approved.

foolip · 2022-02-04T18:02:09Z

I have updated web-platform-tests/rfcs#99 with the proposed 90/10% split of the metric as discussed in the meeting.

foolip added the agenda Agenda item for the next meeting label Feb 2, 2022

foolip mentioned this issue Feb 9, 2022

Agenda for Feb 10 meeting #56

Closed

foolip closed this as completed Feb 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agenda for Feb 3 meeting #54

Agenda for Feb 3 meeting #54

foolip commented Feb 2, 2022 •

edited

Loading

jensimmons commented Feb 3, 2022

foolip commented Feb 3, 2022

foolip commented Feb 4, 2022

foolip commented Feb 4, 2022

Agenda for Feb 3 meeting #54

Agenda for Feb 3 meeting #54

Comments

foolip commented Feb 2, 2022 • edited Loading

jensimmons commented Feb 3, 2022

foolip commented Feb 3, 2022

foolip commented Feb 4, 2022

foolip commented Feb 4, 2022

foolip commented Feb 2, 2022 •

edited

Loading