-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathrfc9318.xml
1778 lines (1724 loc) · 94.9 KB
/
rfc9318.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc [
<!ENTITY nbsp " ">
<!ENTITY zwsp "​">
<!ENTITY nbhy "‑">
<!ENTITY wj "⁠">
]>
<!-- generated by https://github.com/cabo/kramdown-rfc2629 version 1.3.32 -->
<rfc xmlns:xi="http://www.w3.org/2001/XInclude" ipr="trust200902" docName="draft-iab-mnqeu-report-04" number="9318" submissionType="IAB" category="info" consensus="true" obsoletes="" updates="" xml:lang="en" tocInclude="true" sortRefs="true" symRefs="true" version="3">
<!-- [rfced]
b) The boilerplate that follows the Abstract appears to be missing.
-->
<!-- Wes: sounds fine -->
<!-- xml2rfc v2v3 conversion 3.14.1 -->
<front>
<title abbrev="Measuring Network Quality for End-Users">IAB Workshop Report: Measuring Network Quality for End-Users</title>
<seriesInfo name="RFC" value="9318"/>
<author initials="W." surname="Hardaker" fullname="Wes Hardaker">
<address>
<email>[email protected]</email>
</address>
</author>
<author initials="O." surname="Shapira" fullname="Omer Shapira">
<address>
<email>[email protected]</email>
</address>
</author>
<date year="2022" month="September"/>
<keyword>QoE</keyword>
<keyword>QoS</keyword>
<keyword>Quality of Service</keyword>
<keyword>Quality of Experience</keyword>
<keyword>Measurement</keyword>
<keyword>End User</keyword>
<abstract>
<t>The Measuring Network Quality for End-Users workshop was held
virtually by the Internet Architecture Board (IAB) on September 14-16, 2021.
This report summarizes the workshop, the topics discussed, and some
preliminary conclusions drawn at the end of the workshop.</t>
<t>Note that this document is a report on the proceedings of the
workshop. The views and positions documented in this report are
those of the workshop participants and do not necessarily reflect IAB
views and positions. </t>
</abstract>
</front>
<middle>
<section anchor="introduction" numbered="true" toc="default">
<name>Introduction</name>
<t>The Internet Architecture Board (IAB) holds occasional workshops designed to
consider long-term issues and strategies for the Internet, and to suggest
future directions for the Internet architecture. This long-term planning
function of the IAB is complementary to the ongoing engineering efforts
performed by working groups of the Internet Engineering Task Force (IETF).</t>
<t>The Measuring Network Quality for End-Users workshop <xref target="WORKSHOP" format="default"/> was held
virtually by the Internet Architecture Board (IAB) on September 14-16, 2021.
This report summarizes the workshop, the topics discussed, and some preliminary
conclusions drawn at the end of the workshop.</t>
<section anchor="problem-space" numbered="true" toc="default">
<name>Problem Space</name>
<t>The Internet in 2021 is quite different from what it was 10 years ago. Today, it
is a crucial part of everyone's daily life. People use the Internet for their
social life, for their daily jobs, for routine shopping, and for keeping up
with major events. An increasing number of people can access a gigabit
connection, which would be hard to imagine a decade ago. Additionally, thanks to
improvements in security, people trust the Internet for financial
banking transactions, purchasing goods, and everyday bill payments.</t>
<t>At the same time, some aspects of the end-user experience have not
improved as much. Many users have typical connection latencies that
remain at decade-old levels. Despite significant reliability
improvements in data center environments, end users also still often see
interruptions in service. Despite algorithmic advances in the field of
control theory, one still finds that the queuing delays in the
last-mile equipment exceeds the accumulated transit delays. Transport
improvements, such as QUIC, Multipath TCP, and TCP Fast Open, are still
not fully supported in some networks. Likewise, various advances in
the security and privacy of user data are not widely supported, such
as encrypted DNS to the local resolver.</t>
<t>Some of the major factors behind this lack of progress is the popular
perception that throughput is often the sole measure of the quality of
Internet connectivity. With such a narrow focus, the Measuring Network
Quality for End-Users workshop aimed to discuss various topics:</t>
<ul spacing="normal">
<li>What is user latency under typical working conditions?</li>
<li>How reliable is connectivity across longer time periods?</li>
<li>Do networks allow the use of a broad range of protocols?</li>
<li>What services can be run by network clients?</li>
<li>What kind of IPv4, NAT, or IPv6 connectivity is offered, and are there
firewalls?</li>
<li>What security mechanisms are available for local services, such as DNS?</li>
<li>To what degree are the privacy, confidentiality, integrity, and authenticity
of user communications guarded?</li>
<li>Improving these aspects of network quality will likely depend on
measuring and exposing metrics in a meaningful way to all involved
parties, including to end users. Such measurement and exposure of
the right metrics will allow service providers and network operators
to concentrate focus on their users' experience and will
simultaneously empower users to choose the Internet Service
Providers (ISPs) that can deliver the best experience based on their needs.</li>
<li>What are the fundamental properties of a network that contributes to
a good user experience?</li>
<li>What metrics quantify these properties, and how can we collect such metrics in a
practical way?</li>
<li>What are the best practices for interpreting those metrics and incorporating
them in a decision-making process?</li>
<li>What are the best ways to communicate these properties to service providers
and network operators?</li>
<li>How can these metrics be displayed to users in a meaningful way?</li>
</ul>
</section>
</section>
<section anchor="workshop-agenda" numbered="true" toc="default">
<name>Workshop Agenda</name>
<t>The Measuring Network Quality for End-Users workshop was divided into the
following main topic areas; see further discussion in Sections <xref target="discussions" format="counter"/> and <xref target="conclusions" format="counter"/>:</t>
<ul spacing="normal">
<li>Introduction overviews and a keynote by Vint Cerf</li>
<li>Metrics considerations</li>
<li>Cross-layer considerations</li>
<li>Synthesis</li>
<li>Group conclusions</li>
</ul>
</section>
<section anchor="positionpapers" numbered="true" toc="default">
<name>Position Papers</name>
<t>The following position papers were received for consideration by the
workshop attendees. The workshop's web page <xref target="WORKSHOP" format="default"/> contains
archives of the papers, presentations, and recorded videos.</t>
<ul spacing="normal">
<li>Ahmed Aldabbagh. "Regulatory perspective on measuring network quality for end users" <xref target="Aldabbagh2021" format="default"/></li>
<li>Al Morton. "Dream-Pipe or Pipe-Dream: What Do Users Want (and how can we assure it)?" <xref target="I-D.morton-ippm-pipe-dream" format="default"/></li>
<li>Alexander Kozlov. "The 2021 National Internet Segment Reliability Research"</li>
<li>Anna Brunstrom. "Measuring network quality - the MONROE experience"</li>
<li>Bob Briscoe, Greg White, Vidhi Goel, and Koen De Schepper. "A Single Common Metric to Characterize Varying Packet Delay" <xref target="Briscoe2021" format="default"/></li>
<li>Brandon Schlinker. "Internet Performance from Facebook's Edge" <xref target="Schlinker2019" format="default"/></li>
<li>Christoph Paasch, Kristen McIntyre, Randall Meyer, Stuart Cheshire, and Omer Shapira. "An end-user approach to the Internet Score" <xref target="McIntyre2021" format="default"/></li>
<li>Christoph Paasch, Randall Meyer, Stuart Cheshire, and Omer Shapira. "Responsiveness under Working Conditions" <xref target="I-D.cpaasch-ippm-responsiveness" format="default"/></li>
<li>Dave Reed and Levi Perigo. "Measuring ISP Performance in Broadband America: A Study of Latency Under Load" <xref target="Reed2021" format="default"/></li>
<li>Eve M. Schooler and Rick Taylor. "Non-traditional Network Metrics"</li>
<li>Gino Dion. "Focusing on latency, not throughput, to provide better internet experience and network quality" <xref target="Dion2021" format="default"/></li>
<li>Gregory Mirsky, Xiao Min, Gyan Mishra, and Liuyan Han. "The error performance metric in a packet-switched network" <xref target="Mirsky2021" format="default"/></li>
<li>Jana Iyengar. "The Internet Exists In Its Use" <xref target="Iyengar2021" format="default"/></li>
<li>Jari Arkko and Mirja Kuehlewind. "Observability is needed to improve network quality" <xref target="Arkko2021" format="default"/></li>
<li>Joachim Fabini. "Network Quality from an End User Perspective" <xref target="Fabini2021" format="default"/></li>
<li>Jonathan Foulkes. "Metrics helpful in assessing Internet Quality" <xref target="Foulkes2021" format="default"/></li>
<li>Kalevi Kilkki and Benajamin Finley. "In Search of Lost QoS" <xref target="Kilkki2021" format="default"/></li>
<li>Karthik Sundaresan, Greg White, and Steve Glennon. "Latency Measurement: What is latency and how do we measure it?"</li>
<li>Keith Winstein. "Five Observations on Measuring Network Quality for Users of Real-Time Media Applications"</li>
<li>Ken Kerpez, Jinous Shafiei, John Cioffi, Pete Chow, and Djamel Bousaber. "Wi-Fi and Broadband Data" <xref target="Kerpez2021" format="default"/></li>
<li>Kenjiro Cho. "Access Network Quality as Fitness for Purpose"</li>
<li>Koen De Schepper, Olivier Tilmans, and Gino Dion. "Challenges and opportunities of hardware support for Low Queuing Latency without Packet Loss" <xref target="DeSchepper2021" format="default"/></li>
<li>Kyle MacMillian and Nick Feamster. "Beyond Speed Test: Measuring Latency Under Load Across Different Speed Tiers" <xref target="MacMillian2021" format="default"/></li>
<li>Lucas Pardue and Sreeni Tellakula. "Lower-layer performance not indicative of upper-layer success" <xref target="Pardue2021" format="default"/></li>
<li>Matt Mathis. "Preliminary Longitudinal Study of Internet Responsiveness" <xref target="Mathis2021" format="default"/></li>
<li>Michael Welzl. "A Case for Long-Term Statistics" <xref target="Welzl2021" format="default"/></li>
<li>Mikhail Liubogoshchev. "Cross-layer Cooperation for Better Network Service" <xref target="Liubogoshchev2021" format="default"/></li>
<li>Mingrui Zhang, Vidhi Goel, and Lisong Xu. "User-Perceived Latency to Measure CCAs" <xref target="Zhang2021" format="default"/></li>
<li>Neil Davies and Peter Thompson. "Measuring Network Impact on Application Outcomes Using Quality Attenuation" <xref target="Davies2021" format="default"/></li>
<li>Olivier Bonaventure and Francois Michel. "Packet delivery time as a tie-breaker for assessing Wi-Fi access points" <xref target="Michel2021" format="default"/></li>
<li>Pedro Casas. "10 Years of Internet-QoE Measurements. Video, Cloud,
Conferencing, Web and Apps. What do we Need from the Network Side?" <xref target="Casas2021" format="default"/></li>
<li>Praveen Balasubramanian. "Transport Layer Statistics for Network Quality" <xref target="Balasubramanian2021" format="default"/></li>
<li>Rajat Ghai. "Using TCP Connect Latency for measuring CX and Network
Optimization" <xref target="Ghai2021" format="default"/></li>
<li>Robin Marx and Joris Herbots. "Merge Those Metrics: Towards Holistic (Protocol) Logging" <xref target="Marx2021" format="default"/></li>
<li>Sandor Laki, Szilveszter Nadas, Balazs Varga, and Luis M.
Contreras. "Incentive-Based Traffic Management and QoS Measurements" <xref target="Laki2021" format="default"/></li>
<li>Satadal Sengupta, Hyojoon Kim, and Jennifer Rexford. "Fine-Grained RTT Monitoring Inside the Network" <xref target="Sengupta2021" format="default"/></li>
<li>Stuart Cheshire. "The Internet is a Shared Network" <xref target="Cheshire2021" format="default"/></li>
<li>Toerless Eckert and Alex Clemm. "network-quality-eckert-clemm-00.4"</li>
<li>Vijay Sivaraman, Sharat Madanapalli, and Himal Kumar. "Measuring Network Experience Meaningfully, Accurately, and Scalably" <xref target="Sivaraman2021" format="default"/></li>
<li>Yaakov (J) Stein. "The Futility of QoS" <xref target="Stein2021" format="default"/></li>
</ul>
</section>
<section anchor="discussions" numbered="true" toc="default">
<name>Workshop Topics and Discussion</name>
<t>The agenda for the three-day workshop was broken into four separate
sections that each played a role in framing the discussions. The
workshop started with a series of introduction and problem space
presentations (<xref target="introduction-section"/>), followed by metrics considerations
(<xref target="discussion-metrics" format="default"/>), cross-layer considerations
(<xref target="discussions-cross-layer" format="default"/>), and a synthesis discussion (<xref target="synthesis" format="default"/>).
After the four subsections concluded, a follow-on discussion was held
to draw conclusions that could be agreed upon by workshop participants
(<xref target="conclusions" format="default"/>).</t>
<section anchor="introduction-section" numbered="true" toc="default">
<name>Introduction and Overviews</name>
<t>The workshop started with a broad focus on the state of user Quality
of Service (QoS) and Quality of Experience (QoE) on the Internet today.
The goal of the introductory talks was to set the stage for the
workshop by describing both the problem space and the current
solutions in place and their limitations.</t>
<t>The introduction presentations provided views of existing QoS and QoE
measurements and their effectiveness. Also discussed was the
interaction between multiple users within the network, as well as the
interaction between multiple layers of the OSI stack. Vint Cerf
provided a keynote describing the history and importance of the
topic.</t>
<section anchor="dicsucssion-intro-keynote" numbered="true" toc="default">
<name>Key Points from the Keynote by Vint Cerf</name>
<t>We may be operating in a networking space with dramatically different
parameters compared to 30 years ago. This differentiation justifies
reconsidering not only the importance of one metric over the other
but also reconsidering the entire metaphor.</t>
<t>It is time for the experts to look at not only adjusting TCP but
also exploring other protocols, such as QUIC has done lately. It's
important that we feel free to consider alternatives to TCP. TCP is
not a teddy bear, and one should not be afraid to replace it with a
transport layer with better properties that better benefit its users.</t>
<t>A suggestion: we should consider exercises to identify desirable
properties. As we are looking at the parametric spaces, one can
identify "desirable properties", as opposed to "fundamental
properties", for example, a low-latency property. An example coming
from the Advanced Research Projects Agency (ARPA): you want to know where the missile is now, not where it
was. Understanding drives particular parameter creation and selection
in the design space.</t>
<t>When parameter values are changed in extreme, such as connectiveness,
alternative designs will emerge. One case study of note is the
interplanetary protocol, where "ping" is no longer indicative of
anything useful. While we look at responsiveness, we should not ignore
connectivity.</t>
<t>Unfortunately, maintaining backward compatibility is painful. The work
on designing IPv6 so as to transition from IPv4 could have been done
better if the backward compatibility was considered.
It is too late for IPv6, but it is not too late to consider this issue for potential future problems.</t>
<t>IPv6 is still not implemented fully everywhere. It's been a long road
to deployment since starting work in 1996, and we are still not
there. In 1996, the thinking was that it was quite easy to implement
IPv6, but that failed to hold true. In 1996, the dot-com boom began,
where a lot of money was spent quickly, and the moment was not caught in
time while the market expanded exponentially. This should serve as a
cautionary tale.</t>
<t>One last point: consider performance across multiple hops in the
Internet. We've not seen many end-to-end metrics, as successfully
developing end-to-end measurements across different network and
business boundaries is quite hard to achieve. A good question to ask
when developing new protocols is "will the new protocol work across
multiple network hops?"</t>
<t>Multi-hop networks are being gradually replaced by humongous, flat
networks with sufficient connectivity between operators so that
systems become 1 hop, or 2 hops at most, away from each other
(e.g., Google, Facebook, and Amazon). The fundamental architecture of the
Internet is changing.</t>
</section>
<section anchor="discussion-introductions" numbered="true" toc="default">
<name>Introductory Talks</name>
<t>The Internet is a shared network built on IP protocols using
packet switching to interconnect multiple autonomous networks. The
Internet's departure from circuit-switching technologies allowed it to
scale beyond any other known network design. On the other hand, the
lack of in-network regulation made it difficult to ensure the best
experience for every user.</t>
<t>As Internet use cases continue to expand, it becomes increasingly more
difficult to predict which network characteristics correlate with
better user experiences. Different application classes, e.g., video
streaming and teleconferencing, can affect user experience in ways that are complex
and difficult to measure. Internet utilization shifts rapidly
during the course of each day, week, and year, which further
complicates identifying key metrics capable of predicting a good user
experience.</t>
<t>QoS initiatives attempted to overcome these
difficulties by strictly prioritizing different types of
traffic. However, QoS metrics do not always correlate with user
experience. The utility of the QoS metric is further limited by the
difficulties in building solutions with the desired QoS
characteristics.</t>
<t>QoE initiatives attempted to integrate the
psychological aspects of how quality is perceived and create
statistical models designed to optimize the user experience. Despite
these high modeling efforts, the QoE approach proved beneficial in
certain application classes. Unfortunately, generalizing the models
proved to be difficult, and the question of how different applications
affect each other when sharing the same network remains an open problem.</t>
<t>The industry's focus on giving the end user more throughput/bandwidth
led to remarkable advances. In many places around the world, a home
user enjoys gigabit speeds to their ISP. This
is so remarkable that it would have been brushed off as science
fiction a decade ago. However, the focus on increased capacity came at
the expense of neglecting another important core metric: latency. As
a result, end users whose experience is negatively affected by high
latency were advised to upgrade their equipment to get more
throughput instead. <xref target="MacMillian2021" format="default"/> showed that sometimes such an
upgrade can lead to latency improvements, due to the economical
reasons of overselling the "value-priced" data plans.</t>
<t>As the industry continued to give end users more throughput, while
mostly neglecting latency concerns, application designs started to
employ various latency and short service disruption hiding techniques.
For example, a user's web browser performance experience is closely
tied to the content in the browser's local cache. While such
techniques can clearly improve the user experience when using stale
data is possible, this development further decouples user experience
from core metrics.</t>
<t>In the most recent 10 years, efforts by Dave Taht and the bufferbloat
society have led to significant progress in updating queuing algorithms to
reduce latencies under load compared to simpler FIFO
queues. Unfortunately, the home router industry has yet to implement
these algorithms, mostly due to marketing and cost concerns. Most home
router manufacturers depend on System on a Chip (SoC) acceleration to
create products with a desired throughput. SoC manufacturers opt for
simpler algorithms and aggressive aggregation, reasoning that a
higher-throughput chip will have guaranteed demand. Because consumers
are offered choices primarily among different high-throughput devices,
the perception that a higher throughput leads to higher a QoS continues to strengthen.</t>
<t>The home router is not the only place that can benefit from clearer
indications of acceptable performance for users.
Since users perceive the Internet via the lens of applications, it
is important that we call upon application vendors to adopt solutions
that stress lower latencies. Unfortunately, while bandwidth is straightforward to
measure, responsiveness is trickier. Many applications have found a
set of metrics that are helpful to their realm but do not generalize
well and cannot become universally applicable. Furthermore, due to the
highly competitive application space, vendors may have economic
reasons to avoid sharing their most useful metrics.</t>
</section>
<section anchor="discussion-introductions-summary" numbered="true" toc="default">
<name>Introductory Talks - Key Points</name>
<ol spacing="normal" type="1">
<li>Measuring bandwidth is necessary but is not alone sufficient.</li>
<li>In many cases, Internet users don't need more bandwidth but rather
need "better bandwidth", i.e., they need other connectivity improvements.</li>
<li>Users perceive the quality of their Internet connection based
on the applications they use, which are affected by a combination
of factors. There's little value in exposing a typical user to the
entire spectrum of possible reasons for the poor performance
perceived in their application-centric view.</li>
<li>Many factors affecting user experience are outside the users'
sphere of control. It's unclear whether exposing users to these
other factors will help them understand the state of their network
performance. In general, users prefer simple, categorical
choices (e.g., "good", "better", and "best" options).</li>
<li>The Internet content market is highly competitive, and many
applications develop their own "secret sauce".</li>
</ol>
</section>
</section>
<section anchor="discussion-metrics" numbered="true" toc="default">
<name>Metrics Considerations</name>
<t>In the second agenda section, the workshop continued its discussion
about metrics that can be used instead of or in addition to available
bandwidth. Several workshop attendees presented deep-dive studies on
measurement methodology.</t>
<section anchor="common-performance-metrics" numbered="true" toc="default">
<name>Common Performance Metrics</name>
<t>Losing Internet access entirely is, of course, the worst user
experience. Unfortunately, unless rebooting the home router restores
connectivity, there is little a user can do other than contacting
their service provider. Nevertheless, there is value in the systematic
collection of availability metrics on the client side; these can help
the user's ISP localize and resolve issues faster while enabling
users to better choose between ISPs. One can measure availability
directly by simply attempting connections from the client side to
distant locations of interest. For example, Ookla's <xref target="Speedtest" format="default"/>
uses a large number of Android devices to measure network and cellular
availability around the globe. Ookla collects hundreds of millions of
data points per day and uses these for accurate availability
reporting. An alternative approach is to derive availability from the
failure rates of other tests. For example, <xref target="FCC_MBA" format="default"/> and
<xref target="FCC_MBA_methodology" format="default"/> use thousands of off-the-shelf routers, with measurement software developed by
<xref target="SamKnows" format="default"/>. These routers perform an array of network tests and
report availability based on whether test connections were successful or
not.</t>
<t>Measuring available capacity can be helpful to end users, but it is
even more valuable for service providers and application
developers. High-definition video streaming requires significantly
more capacity than any other type of traffic. At the time of the
workshop, video traffic constituted 90% of overall Internet traffic
and contributed to 95% of the revenues from monetization (via
subscriptions, fees, or ads). As a result, video streaming services,
such as Netflix, need to continuously cope with rapid changes in
available capacity. The ability to measure available capacity in
real time leverages the different adaptive bitrate (ABR) compression
algorithms to ensure the best possible user experience. Measuring
aggregated capacity demand allows ISPs to be
ready for traffic spikes. For example, during the end-of-year holiday
season, the global demand for capacity has been shown to be 5-7 times
higher than during other seasons. For end users, knowledge of their
capacity needs can help them select the best data plan given their
intended usage. In many cases, however, end users have more than
enough capacity, and adding more bandwidth will not improve their
experience -- after a point, it is no longer the limiting factor in
user experience. Finally, the ability to differentiate between the
"throughput" and the "goodput" can be helpful in identifying when the
network is saturated.</t>
<t>In measuring network quality, latency is defined as the time it takes
a packet to traverse a network path from one end to the other. At the
time of this report, users in many places worldwide can enjoy Internet
access that has adequately high capacity and availability for their
current needs. For these users, latency improvements, rather than
bandwidth improvements, can lead to the most significant improvements
in QoE. The established latency metric is a
round-trip time (RTT), commonly measured in milliseconds. However,
users often find RTT values unintuitive since, unlike other
performance metrics, high RTT values indicate poor latency and users
typically understand higher scores to be better. To address this,
<xref target="I-D.cpaasch-ippm-responsiveness" format="default"/> and <xref target="Mathis2021" format="default"/> present an inverse metric, called
"Round-trips Per Minute" (RPM).</t>
<t>There is an important distinction between "idle latency" and "latency
under working conditions". The former is measured when the network is
underused and reflects a best-case scenario. The latter is measured
when the network is under a typical workload. Until recently, typical
tools reported a network's idle latency, which can be misleading. For
example, data presented at the workshop shows that idle latencies can
be up to 25 times lower than the latency under typical working
loads. Because of this, it is essential to make a clear distinction
between the two when presenting latency to end users.</t>
<t>Data shows that rapid changes in capacity affect
latency. <xref target="Foulkes2021" format="default"/> attempts to quantify how often a rapid change
in capacity can cause network connectivity to become "unstable" (i.e.,
having high latency with very little throughput). Such changes in
capacity can be caused by infrastructure failures but are much more
often caused by in-network phenomena, like changing traffic
engineering policies or rapid changes in cross-traffic.</t>
<t>Data presented at the workshop shows that 36% of measured lines have
capacity metrics that vary by more than 10% throughout the day and
across multiple days. These differences are caused by many variables,
including local connectivity methods (Wi-Fi vs. Ethernet), competing
LAN traffic, device load/configuration, time of day, and local
loop/backhaul capacity. These factor variations make measuring
capacity using only an end-user device or other end-network
measurement difficult. A network router seeing aggregated traffic from
multiple devices provides a better vantage point for capacity
measurements. Such a test can account for the totality of local
traffic and perform an independent capacity test. However, various
factors might still limit the accuracy of such a test. Accurate
capacity measurement requires multiple samples.</t>
<t>As users perceive the Internet through the lens of applications, it
may be difficult to correlate changes in capacity and latency with the
quality of the end-user experience. For example, web browsers rely on
cached page versions to shorten page load times and mitigate
connectivity losses. In addition, social networking applications often
rely on prefetching their "feed" items. These techniques make the
core in-network metrics less indicative of the users' experience and
necessitates collecting data from the end-user applications themselves.</t>
<t>It is helpful to distinguish between applications that operate on a
"fixed latency budget" from those that have more tolerance to latency
variance. Cloud gaming serves as an example application that requires
a "fixed latency budget", as a sudden latency spike can decide the
"win/lose" ratio for a player. Companies that compete in the lucrative
cloud gaming market make significant infrastructure investments, such
as building entire data centers closer to their users. These data
centers highlight the economic benefit that lower numbers of latency
spikes outweigh the associated deployment costs. On the other hand,
applications that are more tolerant to latency spikes can continue to
operate reasonably well through short spikes. Yet, even those
applications can benefit from consistently low latency depending on
usage shifts. For example, Video-on-Demand (VOD) apps can work
reasonably well when the video is consumed linearly, but once the user
tries to "switch a channel" or to "skip ahead", the user experience
suffers unless the latency is sufficiently low.</t>
<t>Finally, as applications continue to evolve, in-application metrics
are gaining in importance. For example, VOD applications can assess
the QoE by application-specific metrics, such as
whether the video player is able to use the highest possible
resolution, identifying when the video is smooth or freezing, or other
similar metrics. Application developers can then effectively use these
metrics to prioritize future work. All popular video platforms
(YouTube, Instagram, Netflix, and others) have developed frameworks to
collect and analyze VOD metrics at scale. One example is the Scuba
framework used by Meta <xref target="Scuba" format="default"/>.</t>
<t>Unfortunately, in-application metrics can be challenging to use
for comparative research purposes. First, different applications
often use different metrics to measure the same phenomena. For
example, application A may measure the smoothness of video via "mean
time to rebuffer", while application B may rely on the "probability
of rebuffering per second" for the same purpose. A different
challenge with in-application metrics is that VOD is a significant source
of revenue for companies, such as YouTube, Facebook, and Netflix,
placing a proprietary incentive against exchanging the in-application
data. A final concern centers on the privacy issues resulting from
in-application metrics that accurately describe the activities and
preferences of an individual end user.</t>
</section>
<section anchor="availability-metrics" numbered="true" toc="default">
<name>Availability Metrics</name>
<t>Availability is simply defined as whether or not a packet can be sent
and then received by its intended recipient. Availability is naively
thought to be the simplest to measure, but it is more complex when
considering that continual, instantaneous measurements would be needed
to detect the smallest of outages. Also difficult is determining the
root cause of infallibility: was the user's line down, was something in
the middle of the network, or was it the service with which the user
was attempting to communicate?</t>
</section>
<section anchor="capacity-metrics" numbered="true" toc="default">
<name>Capacity Metrics</name>
<t>If the network capacity does not meet user demands, the network quality
will be impacted. Once the capacity meets the demands, increasing capacity
won't lead to further quality improvements.</t>
<t>The actual network connection capacity is determined by the equipment and the
lines along the network path, and it varies throughout the day and across
multiple days. Studies involving DSL lines in North America indicate that over
30% of the DSL lines have capacity metrics that vary by more than 10%
throughout the day and across multiple days.</t>
<t>Some factors that affect the actual capacity are:</t>
<ol spacing="normal" type="1">
<li>Presence of a competing traffic, either in the LAN or in the WAN
environments. In the LAN setting, the competing traffic reflects the
multiple devices that share the Internet connection. In the WAN setting, the
competing traffic often originates from the unrelated network flows that
happen to share the same network path.</li>
<li>Capabilities of the equipment along the path of the network connection,
including the data transfer rate and the amount of memory used for
buffering.</li>
<li>Active traffic management measures, such as traffic shapers and policers
that are often used by the network providers.</li>
</ol>
<t>There are other factors that can negatively affect the actual line capacities.</t>
<t>The user demands of the traffic follow the usage patterns and preferences of
the particular users. For example, large data transfers can use any available
capacity, while the media streaming applications require limited capacity to
function correctly. Videoconferencing applications typically need less
capacity than high-definition video streaming.</t>
</section>
<section anchor="latency-metrics" numbered="true" toc="default">
<name>Latency Metrics</name>
<t>End-to-end latency is the time that a particular packet takes to traverse the
network path from the user to their destination and back. The end-to-end
latency comprises several components:</t>
<ol spacing="normal" type="1">
<li>The propagation delay, which reflects the path distance and the individual
link technologies (e.g., fiber vs. satellite). The propagation doesn't depend
on the utilization of the network, to the extent that the network path
remains constant.</li>
<li>The buffering delay, which reflects the time segments spent in the memory of
the network equipment that connect the individual network links, as well as
in the memory of the transmitting endpoint. The buffering delay depends on
the network utilization, as well as on the algorithms that govern the queued segments.</li>
<li>The transport protocol delays, which reflect the time spent in
retransmission and reassembly, as well as the time spent when the transport
is "head-of-line blocked".</li>
<li>Some of the workshop submissions that have explicitly called out the application
delay, which reflects the inefficiencies in the application layer.</li>
</ol>
<t>Typically, end-to-end latency is measured when the network is
idle. Results of such measurements mostly reflect the propagation
delay but not other kinds of delay. This report uses the term "idle
latency" to refer to results achieved under idle network conditions.</t>
<t>Alternatively, if the latency is measured when the network is under
its typical working conditions, the results reflect multiple types of
delays. This report uses the term "working latency" to refer to such
results. Other sources use the term "latency under load" (LUL) as a
synonym.</t>
<t>Data presented at the workshop reveals a substantial difference
between the idle latency and the working latency. Depending on the
traffic direction and the technology type, the working latency is
between 6 to 25 times higher than the idle latency:</t>
<table align="center">
<thead>
<tr>
<th align="left">Direction</th>
<th align="left">Technology Type</th>
<th align="left">Working Latency</th>
<th align="left">Idle Latency</th>
<th align="left">Working - Idle Difference</th>
<th align="left">Working / Idle Ratio</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Downstream</td>
<td align="left">FTTH</td>
<td align="left">148</td>
<td align="left">10</td>
<td align="left">138</td>
<td align="left">15</td>
</tr>
<tr>
<td align="left">Downstream</td>
<td align="left">Cable</td>
<td align="left">103</td>
<td align="left">13</td>
<td align="left">90</td>
<td align="left">8</td>
</tr>
<tr>
<td align="left">Downstream</td>
<td align="left">DSL</td>
<td align="left">194</td>
<td align="left">10</td>
<td align="left">184</td>
<td align="left">19</td>
</tr>
<tr>
<td align="left">Upstream</td>
<td align="left">FTTH</td>
<td align="left">207</td>
<td align="left">12</td>
<td align="left">195</td>
<td align="left">17</td>
</tr>
<tr>
<td align="left">Upstream</td>
<td align="left">Cable</td>
<td align="left">176</td>
<td align="left">27</td>
<td align="left">149</td>
<td align="left">6</td>
</tr>
<tr>
<td align="left">Upstream</td>
<td align="left">DSL</td>
<td align="left">686</td>
<td align="left">27</td>
<td align="left">659</td>
<td align="left">25</td>
</tr>
</tbody>
</table>
<t>While historically the tooling available for measuring latency focused
on measuring the idle latency, there is a trend in the industry to
start measuring the working latency as well,
e.g., Apple's <xref target="NetworkQuality" format="default"/>.</t>
</section>
<section anchor="measurement-case-studies" numbered="true" toc="default">
<name>Measurement Case Studies</name>
<t>The participants have proposed several concrete methodologies for
measuring the network quality for the end users.</t>
<t><xref target="I-D.cpaasch-ippm-responsiveness" format="default"/> introduced a methodology for measuring working latency
from the end-user vantage point. The suggested method incrementally
adds network flows between the user device and a server endpoint until
a bottleneck capacity is reached. From these measurements, a round-trip
latency is measured and reported to the end user. The authors
chose to report results with the RPM metric. The methodology had been
implemented in Apple's macOS Monterey.</t>
<t><xref target="Mathis2021" format="default"/> applied the RPM metric to the results of more than
4 billion download tests that M-Lab performed from 2010-2021. During
this time frame, the M-Lab measurement platform underwent several
upgrades that allowed the research team to compare the effect of
different TCP congestion control algorithms (CCAs) on the measured
end-to-end latency. The study showed that the use of cubic CCA leads to
increased working latency, which is attributed to its use of larger
queues.</t>
<t><xref target="Schlinker2019" format="default"/> presented a large-scale study that aimed to
establish a correlation between goodput and QoE on a
large social network. The authors performed the measurements at
multiple data centers from which video segments of set sizes were
streamed to a large number of end users. The authors used the goodput
and throughput metrics to determine whether particular paths were
congested.</t>
<t><xref target="Reed2021" format="default"/> presented the analysis of working latency measurements collected as part of the Measuring Broadband America (MBA)
program by the Federal Communication Commission (FCC). The FCC does not include working latency in its yearly report
but does offer it in the raw data files. The authors used a
subset of the raw data to identify important differences in the
working latencies across different ISPs.</t>
<t><xref target="MacMillian2021" format="default"/> presented analysis of working latency across
multiple service tiers. They found that, unsurprisingly, "premium"
tier users experienced lower working latency compared to a "value"
tier. The data demonstrated that working latency varies significantly
within each tier; one possible explanation is the difference in
equipment deployed in the homes.</t>
<t>These studies have stressed the importance of measurement of working
latency. At the time of this report, many home router manufacturers
rely on hardware-accelerated routing that uses FIFO queues. Focusing
on measuring the working latency measurements on these devices and
making the consumer aware of the effect of choosing one manufacturer
vs. another can help improve the home router situation. The ideal
test would be able to identify the working latency and pinpoint
the source of the delay (home router, ISP, server side, or some network
node in between).</t>
<t>Another source of high working latency comes from network routers
exposed to cross-traffic. As <xref target="Schlinker2019" format="default"/> indicated, these can
become saturated during the peak hours of the day. Systematic testing
of the working latency in routers under load can help improve both our
understanding of latency and the impact of deployed infrastructure.</t>
</section>
<section anchor="discussions-metrics-key-points" numbered="true" toc="default">
<name>Metrics Key Points</name>
<t>The metrics for network quality can be roughly grouped into the following:</t>
<ol spacing="normal" type="1">
<li>Availability metrics, which indicate whether the user can access
the network at all.</li>
<li>Capacity metrics, which indicate whether the actual line capacity is
sufficient to meet the user's demands.</li>
<li>Latency metrics, which indicate if the user gets the data in a timely fashion.</li>
<li>Higher-order metrics, which include both the network metrics, such as
inter-packet arrival time, and the application metrics, such as the mean
time between rebuffering for video streaming.</li>
</ol>
<t>The availability metrics can be seen as a derivative of either the capacity (zero
capacity leading to zero availability) or the latency (infinite latency
leading to zero availability).</t>
<t>Key points from the presentations and discussions included the following:</t>
<ol spacing="normal" type="1">
<li>Availability and capacity are "hygienic factors" -- unless an
application is capable of using extra capacity, end users will see
little benefit from using over-provisioned lines.</li>
<li>Working latency has a stronger correlation with the user experience
than latency under an idle network load. Working latency can
exceed the idle latency by order of magnitude.</li>
<li>The RPM metric is a stable metric, with positive values being
better, that may be more effective when communicating latency to
end users.</li>
<li>The relationship between throughput and goodput can be effective in
finding the saturation points, both in client-side <xref target="I-D.cpaasch-ippm-responsiveness" format="default"/>
and server-side <xref target="Schlinker2019" format="default"/> settings.</li>
<li>Working latency depends on the algorithm choice for addressing endpoint
congestion control and router queuing.</li>
</ol>
<t>Finally, it was commonly agreed to that the best metrics are those
that are actionable.</t>
</section>
</section>
<section anchor="discussions-cross-layer" numbered="true" toc="default">
<name>Cross-Layer Considerations</name>
<t>In the cross-layer segment of the workshop, participants presented
material on and discussed how to accurately measure exactly where
problems occur. Discussion centered especially on the differences
between physically wired and wireless connections and the difficulties
of accurately determining problem spots when multiple different types
of network segments are responsible for the quality. As an example,
<xref target="Kerpez2021" format="default"/> showed that a limited bandwidth of 2.4 Ghz Wi-Fi bottlenecks the most frequently. In comparison, the wider bandwidth of
the 5 Ghz Wi-Fi has only bottlenecked in 20% of observations.</t>
<t>The participants agreed that no single component of a network
connection has all the data required to measure the effects of the
network performance on the quality of the end-user experience.</t>
<ul spacing="normal">
<li>Applications that are running on the end-user devices have the best
insight into their respective performance but have limited
visibility into the behavior of the network itself and are unable
to act based on their limited perspective.</li>
<li>ISPs have good insight into QoS
considerations but are not able to infer the effect of the QoS
metrics on the quality of end-user experiences.</li>
<li>Content providers have good insight into the aggregated behavior of
the end users but lack the insight on what aspects of network
performance are leading indicators of user behavior.</li>
</ul>
<t>The workshop had identified the need for a standard and extensible way
to exchange network performance characteristics. Such an exchange
standard should address (at least) the following:</t>
<ul spacing="normal">
<li>A scalable way to capture the performance of multiple (potentially
thousands of) endpoints.</li>
<li>The data exchange format should prevent data manipulation so that
the different participants won't be able to game the mechanisms.</li>
<li>Preservation of end-user privacy. In particular, federated learning
approaches should be preferred so that no centralized entity has the
access to the whole picture.</li>
<li>A transparent model for giving the different actors on a network
connection an incentive to share the performance data they collect.</li>
<li>An accompanying set of tools to analyze the data.</li>
</ul>
<section anchor="separation-of-concerns" numbered="true" toc="default">
<name>Separation of Concerns</name>
<t>Commonly, there's a tight coupling between collecting performance
metrics, interpreting those metrics, and acting upon the
interpretation. Unfortunately, such a model is not the best for
successfully exchanging cross-layer data, as:</t>
<ul spacing="normal">
<li>actors that are able to collect particular performance metrics
(e.g., the TCP RTT) do not necessarily have the context necessary for
a meaningful interpretation,</li>
<li>the actors that have the context and the computational/storage
capacity to interpret metrics do not necessarily have the ability to
control the behavior of the network/application, and</li>
<li>the actors that can control the behavior of networks and/or
applications typically do not have access to complete measurement
data.</li>
</ul>
<t>The participants agreed that it is important to separate the above
three aspects, so that:</t>
<ul spacing="normal">
<li>the different actors that have the data, but not the ability to
interpret and/or act upon it, should publish their measured data and</li>
<li>the actors that have the expertise in interpreting and synthesizing
performance data should publish the results of their interpretations.</li>
</ul>
</section>
<section anchor="security-and-privacy-considerations" numbered="true" toc="default">
<name>Security and Privacy Considerations</name>
<t>Preserving the privacy of Internet end users is a difficult
requirement to meet when addressing this problem space. There is an
intrinsic trade-off between collecting more data about user
activities and infringing on their privacy while doing so.
Participants agreed that observability across multiple layers is
necessary for an accurate measurement of the network quality, but
doing so in a way that minimizes privacy leakage is an open question.</t>
</section>
<section anchor="metric-measurement-considerations" numbered="true" toc="default">
<name>Metric Measurement Considerations</name>
<ul spacing="normal">
<li>
<t>The following TCP protocol metrics have been found to be effective
and are available for passive measurement:
</t>
<ul spacing="normal">
<li>TCP connection latency measured using selective acknowledgment (SACK) or acknowledgment (ACK) timing, as well as
the timing between TCP retransmission events, are good proxies for
end-to-end RTT measurements.</li>
<li>On the Linux platform, the tcp_info structure is the de facto
standard for an application to inspect the performance of
kernel-space networking. However, there is no equivalent
de facto standard for user-space networking.</li>
</ul>
</li>
<li>
<t>The QUIC and MASQUE protocols make passive performance measurements
more challenging.
</t>
<ul spacing="normal">
<li>An approach that uses federated measurement/hierarchical
aggregation may be more valuable for these protocols.</li>
<li>The QLOG format seems to be the most mature candidate for such
an exchange.</li>
</ul>
</li>
</ul>
</section>
<section anchor="discussions-cross-observability" numbered="true" toc="default">
<name>Towards Improving Future Cross-Layer Observability</name>
<t>The ownership of the Internet is spread across multiple administrative
domains, making measurement of end-to-end performance data
difficult. Furthermore, the immense scale of the Internet makes
aggregation and analysis of this difficult. <xref target="Marx2021" format="default"/> presented a
simple logging format that could potentially be used to collect and
aggregate data from different layers.</t>
<t>Another aspect of the cross-layer collaboration hampering measurement is
that the majority of current algorithms do not explicitly provide
performance data that can be used in cross-layer analysis. The IETF
community could be more diligent in identifying each protocol's key
performance indicators and exposing them as part of the protocol
specification.</t>
<t>Despite all these challenges, it should still be possible to perform
limited-scope studies in order to have a better understanding of how
user quality is affected by the interaction of the different
components that constitute the Internet. Furthermore, recent
development of federated learning algorithms suggests that it might be
possible to perform cross-layer performance measurements while
preserving user privacy.</t>
</section>
<section anchor="discussions-cross-layer-hw-tp" numbered="true" toc="default">
<name>Efficient Collaboration between Hardware and Transport Protocols</name>
<t>With the advent of the low latency, low loss, and scalable throughput
(L4S) congestion notification and control, there is an even higher
need for the transport protocols and the underlying hardware to work
in unison.</t>
<t>At the time of the workshop, the typical home router uses a single
FIFO queue that is large enough to allow amortizing the lower-layer header
overhead across multiple transport PDUs. These designs worked well
with the cubic congestion control algorithm, yet the newer generation
of algorithms can operate on much smaller queues. To fully support latencies
less than 1 ms, the home router needs to work efficiently on sequential
transmissions of just a few segments vs. being optimized for large
packet bursts.</t>
<t>Another design trait common in home routers is the use of packet
aggregation to further amortize the overhead added by the lower-layer
headers. Specifically, multiple IP datagrams are combined into a
single, large transfer frame. However, this aggregation can add up to
10 ms to the packet sojourn delay.</t>
<t>Following the famous "you can't improve what you don't measure" adage,
it is important to expose these aggregation delays in a way that would
allow identifying the source of the bottlenecks and making hardware
more suitable for the next generation of transport protocols.</t>
</section>
<section anchor="cross-layer-keypoints" numbered="true" toc="default">
<name>Cross-Layer Key Points</name>
<ul spacing="normal">
<li>Significant differences exist in the characteristics of metrics to be measured and the required optimizations needed in wireless vs. wired
networks.</li>
<li>Identification of an issue's root cause is hampered by the
challenges in measuring multi-segment network paths.</li>
<li>No single component of a network connection has all the data
required to measure the effects of the complete network performance
on the quality of the end-user experience.</li>
<li>Actionable results require both proper collection and interpretation.</li>
<li>Coordination among network providers is important to successfully
improve the measurement of end-user experiences.</li>
<li>Simultaneously providing accurate measurements while preserving
end-user privacy is challenging.</li>
<li>Passive measurements from protocol implementations may provide
beneficial data.</li>
</ul>
</section>
</section>
<section anchor="synthesis" numbered="true" toc="default">
<name>Synthesis</name>
<t>Finally, in the synthesis section of the workshop, the presentations
and discussions concentrated on the next steps likely needed to make
forward progress. Of particular concern is how to bring forward
measurements that can make sense to end users trying to select
between various networking subscription options.</t>
<section anchor="measurement-and-metrics-considerations" numbered="true" toc="default">
<name>Measurement and Metrics Considerations</name>
<t>One important consideration is how decisions can be made and what actions
can be taken based on collected metrics. Measurements must be integrated
with applications in order to get true application views of
congestion, as measurements over different infrastructure or via other
applications may return incorrect results. Congestion itself can be a
temporary problem, and mitigation strategies may need to be different
depending on whether it is expected to be a short-term or long-term
phenomenon. A significant challenge exists in measuring short-term
problems, driving the need for continuous measurements to ensure
critical moments and long-term trends are captured. For short-term
problems, workshop participants debated whether an issue that goes
away is indeed a problem or is a sign that a network is properly
adapting and self-recovering.</t>
<t>Important consideration must be taken when constructing metrics in
order to understand the results. Measurements can also be affected by
individual packet characteristics -- differently sized packets typically have a
linear relationship with their delay. With this in mind,
measurements can be divided into a delay based on geographical
distances, a packet-size serialization delay, and a variable (noise)
delay. Each of these three sub-component delays can be different and
individually measured across each segment in a multi-hop path.
Variable delay can also be significantly impacted by external factors,
such as bufferbloat, routing changes, network load sharing, and other
local or remote changes in performance. Network measurements,
especially load-specific tests, must also be run long enough to ensure
that any problems associated with buffering, queuing, etc. are captured.
Measurement technologies should also distinguish between upstream and
downstream measurements, as well as measure the difference between
end-to-end paths and sub-path measurements.</t>
</section>
<section anchor="end-user-metrics-presentation" numbered="true" toc="default">
<name>End-User Metrics Presentation</name>
<t>Determining end-user needs requires informative measurements and
metrics. How do we provide the users with the service they need or
want? Is it possible for users to even voice their desires
effectively? Only high-level, simplistic answers like "reliability",
"capacity", and "service bundling" are typical answers given in
end-user surveys. Technical requirements that operators can consume,
like "low-latency" and "congestion avoidance", are not terms known to
and used by end users.</t>
<t>Example metrics useful to end users might include the number of users
supported by a service and the number of applications or streams that
a network can support. An example solution to combat networking
issues include incentive-based traffic management strategies (e.g., an
application requesting lower latency may also mean accepting lower
bandwidth). User-perceived latency must be considered, not just
network latency -- user experience in-application to in-server
latency and network-to-network measurements may only be studying the
lowest-level latency. Thus, picking the right protocol to use in a
measurement is critical in order to match user experience (for
example, users do not transmit data over ICMP, even though it is a
common measurement tool).</t>
<t>In-application measurements should consider how to measure different
types of applications, such as video streaming, file sharing,
multi-user gaming, and real-time voice communications. It may be that
asking users for what trade-offs they are willing to accept would be a
helpful approach: would they rather have a network with low latency
or a network with higher bandwidth? Gamers may make different
decisions than home office users or content producers, for example.</t>
<t>Furthermore, how can users make these trade-offs in a fair manner that
does not impact other users? There is a tension between solutions in
this space vs. the cost associated with solving these problems, as well as
which customers are willing to front these improvement costs.</t>
<t>Challenges in providing higher-priority traffic to users centers
around the ability for networks to be willing to listen to client
requests for higher incentives, even though commercial interests may
not flow to them without a cost incentive. Shared mediums in general
are subject to oversubscribing, such that the number of users a network
can support is either accurate on an underutilized network or may
assume an average bandwidth or other usage metric that fails to be
accurate during utilization spikes. Individual metrics are also
affected by in-home devices from cheap routers to microwaves and by
(multi-)user behaviors during tests. Thus, a single metric alone or a
single reading without context may not be useful in assisting a user
or operator to determine where the problem source actually is.</t>
<t>User comprehension of a network remains a challenging problem.
Multiple workshop participants argued for a single number (potentially
calculated with a weighted aggregation formula) or a small number of
measurements per expected usage (e.g., a "gaming" score vs. a "content
producer" score). Many agreed that some users may instead prefer to
consume simplified or color-coded ratings (e.g., good/better/best,
red/yellow/green, or bronze/gold/platinum).</t>
</section>
<section anchor="synthesis-key-points" numbered="true" toc="default">
<name>Synthesis Key Points</name>
<ul spacing="normal">
<li>
<t>Some proposed metrics:</t>
<ul spacing="normal">
<li>Round-trips Per Minute (RPM)</li>
<li>users per network</li>
<li>latency</li>
<li>99% latency and bandwidth</li>
</ul>
</li>
<li>Median and mean measurements are distractions from the real problems.</li>
<li>Shared network usage greatly affects quality.</li>
<li>Long measurements are needed to capture all facets of potential
network bottlenecks.</li>
<li>Better-funded research in all these areas is needed for progress.</li>
<li>End users will best understand a simplified score or ranking system.</li>
</ul>
</section>
</section>
</section>
<section anchor="conclusions" numbered="true" toc="default">
<name>Conclusions</name>
<t>During the final hour of the three-day workshop, statements that the group deemed to be summary statements were gathered. Later, any statements that were in contention were discarded (listed further below for completeness).
For this document, the authors took the original list
and divided it into rough categories, applied some suggested edits