forked from WICG/scroll-to-text-fragment
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.bs
2019 lines (1707 loc) · 89 KB
/
index.bs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<pre class='metadata'>
Status: CG-DRAFT
Title: Text Fragments
ED: https://wicg.github.io/scroll-to-text-fragment/
Shortname: text-fragments
Level: 1
Editor: Nick Burris, Google https://www.google.com, [email protected]
Editor: David Bokan, Google https://www.google.com, [email protected]
Abstract: Text Fragments adds support for specifying a text snippet in the URL
fragment. When navigating to a URL with such a fragment, the user agent
can quickly emphasise and/or bring it to the user's attention.
Group: wicg
Repository: wicg/scroll-to-text-fragment
Markup Shorthands: markdown yes
WPT Display: inline
</pre>
<pre class='link-defaults'>
spec:css-display-3; type:value; for:display; text:flex
spec:css-display-3; type:value; for:display; text:grid
spec:dom; type:dfn; for:/; text:element
spec:html; type:element; text:link
spec:html; type:dfn; for:/; text:origin
spec:html; type:element; text:script
spec:html; type:element; text:style
spec:url; type:dfn; text:fragment
</pre>
<pre class="biblio">
{
"document-policy": {
"authors": [
"Ian Clelland"
],
"href": "https://w3c.github.io/webappsec-permissions-policy/document-policy.html",
"title": "Document Policy",
"status": "ED",
"publisher": "W3C",
"deliveredBy": [
"https://www.w3.org/2011/webappsec/"
]
},
"fetch-metadata": {
"authors": [
"Mike West"
],
"href": "https://w3c.github.io/webappsec-fetch-metadata/",
"title": "Fetch Metadata Request Headers",
"status": "WD",
"publisher": "W3C",
"deliveredBy": [
"https://www.w3.org/TR/fetch-metadata/"
]
}
}
</pre>
<h2 id=infrastructure>Infrastructure</h2>
<p>This specification depends on the Infra Standard. [[!INFRA]]
# Introduction # {#introduction}
<div class='note'>This section is non-normative</div>
## Use cases ## {#use-cases}
### Web text references ### {#web-text-references}
The core use case for text fragments is to allow URLs to serve as an exact text
reference across the web. For example, Wikipedia references could link to the
exact text they are quoting from a page. Similarly, search engines can serve
URLs that direct the user to the answer they are looking for in the page rather
than linking to the top of the page.
### User sharing ### {#user-sharing}
With text fragments, browsers may implement an option to 'Copy URL to here'
when the user opens the context menu on a text selection. The browser can then
generate a URL with the text selection appropriately specified, and the
recipient of the URL will have the specified text conveniently indicated.
Without text fragments, if a user wants to share a passage of text from a page,
they would likely just copy and paste the passage, in which case the receiver
loses the context of the page.
# Description # {#description}
## Indication ## {#indication}
<div class='note'>This section is non-normative</div>
This specification intentionally doesn't define what actions a user agent
should or could take to "indicate" a text match. There are different
experiences and trade-offs a user agent could make. Some examples of possible
actions:
* Providing visual emphasis or highlight of the text passage
* Automatically scrolling the passage into view when the page is navigated
* Activating a UA's find-in-page feature on the text passage
* Providing a "Click to scroll to text passage" notification
* Providing a notification when the text passage isn't found in the page
<div class='note'>
The choice of action can have implications for user security and privacy. See
the [[#security-and-privacy]] section for details.
</div>
## Syntax ## {#syntax}
<div class='note'>This section is non-normative</div>
A [=text fragment directive=] is specified in the [=fragment directive=] (see
[[#the-fragment-directive]]) with the following format:
<pre>
#:~:text=[prefix-,]textStart[,textEnd][,-suffix]
context |-------match-----| context
</pre>
<em>(Square brackets indicate an optional parameter)</em>
The text parameters are percent-decoded before matching. Dash (-), ampersand
(&), and comma (,) characters in text parameters must be percent-encoded to
avoid being interpreted as part of the text directive syntax.
The only required parameter is textStart. If only textStart is specified, the
first instance of this exact text string is the target text.
<div class="example">
<code>#:~:text=an%20example%20text%20fragment</code> indicates that the
exact text "an example text fragment" is the target text.
</div>
If the textEnd parameter is also specified, then the text directive refers to a
range of text in the page. The target text range is the text range starting at
the first instance of textStart, until the first instance of textEnd that
appears after textStart. This is equivalent to specifying the entire text range
in the textStart parameter, but allows the URL to avoid being bloated with a
long text directive.
<div class="example">
<code>#:~:text=an%20example,text%20fragment</code> indicates that the first
instance of "an example" until the following first instance of "text fragment"
is the target text.
</div>
### Context Terms ### {#context-terms}
<div class='note'>This section is non-normative</div>
The other two optional parameters are context terms. They are specified by the
dash (-) character succeeding the prefix and preceding the suffix, to
differentiate them from the textStart and textEnd parameters, as any
combination of optional parameters may be specified.
Context terms are used to disambiguate the target text fragment. The context
terms can specify the text immediately before (prefix) and immediately after
(suffix) the text fragment, allowing for whitespace.
<div class="note">
While the context terms must be the immediate text surrounding the target text
fragment, any amount of whitespace is allowed between context terms and the
text fragment. This helps allow context terms to be across element boundaries,
for example if the target text fragment is at the beginning of a paragraph and
it must be disambiguated by the previous element's text as a prefix.
</div>
The context terms are not part of the targeted text fragment and must not be
visually indicated.
<div class="example">
<code>#:~:text=this%20is-,an%20example,-text%20fragment</code> would match
to "an example" in "this is an example text fragment", but not match to "an
example" in "here is an example text".
</div>
### BiDi Considerations ### {#bidi-considerations}
<div class='note'>This section is non-normative</div>
<div class='note'>
See <a
href="https://www.w3.org/International/articles/inline-bidi-markup/uba-basics.en">Unicode
Bidirectional Algorithm basics</a> for a good overview of how Bidirectional
text works.
</div>
Since URL strings are ASCII encoded, they provide no built-in support for
bi-directional text. However, the content that we wish to target on a page may
be LTR (left-to-right), RTL (right-to-left) or both (Bidirectional/BiDi). This
section provides an intuitive description the behavior implicitly described by
the normative sections further in this spec.
The characters of each term in the text fragment are in <em>logical order</em>,
that is, the order in which a native reader would read them in (and also the
order in which characters are stored in memory).
Similarly, the <code>prefix</code> and <code>textStart</code> terms identify
text coming before another term in logical order, while <code>suffix</code> and
<code>textEnd</code> follow other terms in logical order.
Note: user agents may visually render URLs in a manner friendlier to a native
reader, for example, by converting the displayed string to Unicode. However, the
string representation of a URL remains plain ASCII characters.
<div class="example">
Suppose we want to select the text <code>مِصر</code> (Egypt, in Arabic),
that's preceeded by <code>البحرين</code> (Bahrain, in Arabic). We would
first percent encode each term:
<code>مِصر</code> becomes "%D9%85%D8%B5%D8%B1" (Note: UTF-8 character
[0xD9,0x85] is the first (right-most) character of the Arabic word.)
<code>البحرين</code> becomes "%D8%A7%D9%84%D8%A8%D8%AD%D8%B1%D9%8A%D9%86"
The text fragment would then become:
<code>
:~:text=%D8%A7%D9%84%D8%A8%D8%AD%D8%B1%D9%8A%D9%86-,%D9%85%D8%B5%D8%B1
</code>
When displayed in a browser's address bar, the browser may visually render the
text in its natural RTL direction, appearing to the user:
<code>
:~:text=البحرين-,مِصر
</code>
</div>
## The Fragment Directive ## {#the-fragment-directive}
To avoid compatibility issues with usage of existing URL fragments, this spec
introduces the [=fragment directive=]. The [=fragment directive=] is a portion
of the URL [=url/fragment=] that follows the [=fragment directive delimiter=].
The <dfn>fragment directive delimiter</dfn> is the string ":~:", that is the
three consecutive code points U+003A (:), U+007E (~), U+003A (:).
<div class="note">
The [=fragment directive=] is part of the URL fragment. This means it must
always appear after a U+0023 (#) code point in a URL.
</div>
<div class="example">
To add a [=fragment directive=] to a URL like https://example.com, a fragment
must first be appended to the URL: https://example.com#:~:text=foo.
</div>
The fragment directive is meant to carry instructions, such as
<code>text=</code>, for the UA rather than for the document.
To prevent impacting page operation, it is stripped from a [=Document=]'s
[=Document/URL=] so that author scripts can't directly interact with it. This
also ensures future directives could be added without introducing breaking
changes to existing content. Potential examples could be: image-fragments,
translation-hints.
### Processing the fragment directive ### {#processing-the-fragment-directive}
The fragment directive is processed and removed from the fragment whenever the
UA sets the [=Document/URL=] on a [=Document=]. This is defined with the
following additions and changes.
To the definition of [=Document=], add:
> <strong>Monkeypatching [[DOM]]:</strong>
>
> <em>
> Each document has an associated <dfn>fragment directive</dfn> which is
> either null or an ASCII string holding data used by the UA to process the
> resource. It is initially null.
> </em>
Whenever the fragment directive is stripped from the URL, it is set to the
Document's [=fragment directive=].
Add a series of steps that will process a fragment directive on a [=Document/URL=]:
> <strong>Monkeypatching [[DOM]]:</strong>
>
> To <dfn>process and consume fragment directive</dfn> from a [=/URL=]
> |url| and [=Document=] |document|, run these steps:
> 1. Let |raw fragment| be equal to |url|'s [=url/fragment=].
> 1. If |raw fragment| is non-null and contains the [=fragment directive
> delimiter=] as a substring:
> 1. Let |fragmentDirectivePosition| be the index of the first instance
> of the [=fragment directive delimiter=] in |raw fragment|.
> 1. Let |fragment| be the substring of |raw fragment| starting at 0 of
> count |fragmentDirectivePosition|.
> 1. Advance |fragmentDirectivePosition| by the length of [=fragment
> directive delimiter=].
> 1. Let |fragment directive| be the substring of |raw fragment| starting
> at |fragmentDirectivePosition|.
> 1. Set |url|'s [=url/fragment=] to |fragment|.
> 1. Set |document|'s [=fragment directive=] to |fragment directive|.
> <div class="note">This is stored on the document but currently not
> web-exposed</div>
<div class="note">
These changes make a URL's fragment end at the [=fragment directive
delimiter=]. The [=fragment directive=] includes all characters that follow,
but not including, the delimiter.
</div>
<div class="example">
<code>https://example.org/#test:~:text=foo</code> will be parsed such that
the fragment is the string "test" and the [=fragment directive=] is the string
"text=foo".
</div>
Amend the
<a href="https://html.spec.whatwg.org/multipage/browsing-the-web.html#initialise-the-document-object">
create and initialize a Document object</a> steps to parse and remove the
[=fragment directive=] by inserting the following steps right before the
setting |document|'s [=Document/URL=]
(<a href="https://html.spec.whatwg.org/commit-snapshots/6ccb1ec8b8e79116880ea7a519d5a96fe8558afc/#initialise-the-document-object">currently</a>
step 9):
> <strong>Monkeypatching [[HTML]]:</strong>
>
> 9. Run the [=process and consume fragment directive=] steps on
> |creationURL| and |document|.
> 10. Set |document|'s [=Document/URL=] to be |creationURL|.
Amend the
<a href="https://html.spec.whatwg.org/multipage/browsing-the-web.html#traverse-the-history">
traverse the history</a> steps to process the [=fragment directive=]
during a history navigation by inserting steps before setting the |newDocument|'s URL (<a
href="https://html.spec.whatwg.org/commit-snapshots/6ccb1ec8b8e79116880ea7a519d5a96fe8558afc/#traverse-the-history">currently</a>
step 6).
> <strong>Monkeypatching [[HTML]]:</strong>
>
> 6. Let |processedURL| be a copy of <var ignore="">entry</var>'s URL.
> 7. Run the [=process and consume fragment directive=] steps on
> |processedURL| and |document|.
> 8. Set |newDocument|'s URL to |processedURL|.
<div class="note">
<p>
The changes in this section imply that a URL is only stripped of its fragment
directive when it is set on a Document. Notably, since a window's
{{Location}} object is a representation of the [=/URL=] of the [=active
document=], all getters on it will show a fragment-directive-stripped
version of the URL.
</p>
<p>
Some examples should help clarify various edge cases.
</p>
</div>
<div class="example">
```
window.location = 'https://example.com#foo:~:bar';
```
The page loads and when the document's URL is set the fragment directive is
stripped out during the "create and initialize a Document object" steps.
```
console.log(window.location.href); // 'https://example.com#foo'
console.log(window.location.hash); // '#foo'
```
Since same document navigations are made by adding a new session history
entry and using the "traverse the history" steps, the the fragment directive
will be stripped here as well.
```
window.location.hash = 'fizz:~:buzz';
console.log(window.location.href); // 'https://example.com#fizz'
console.log(window.location.hash); // '#fizz'
```
The hashchange event is dispatched when only the fragment directive changes
because the comparison for it is done on the URLs in the session history
entries, where the fragment directive hasn't been removed.
```
onhashchange = () => {console.log('HASHCHANGE');};
window.location.hash = 'fizz:~:zillch'; // 'HASHCHANGE'
console.log(window.location.href); // 'https://example.com#fizz'
console.log(window.location.hash); // '#fizz'
```
</div>
<div class="example">
In other cases where a Document's URL is not set by the UA, there is no
fragment directive stripping.
For URL objects:
```
let url = new URL('https://example.com#foo:~:bar');
console.log(url.href); // 'https://example.com#foo:~:bar'
console.log(url.hash); // '#foo:~:bar'
document.url = url;
console.log(document.url.href); // 'https://example.com#foo:~:bar'
console.log(document.url.hash); // '#foo:~:bar'
```
The `<a>` or `<area>` elements:
```
<a id='anchor' href="https://example.com#foo:~:bar">Anchor</a>
<script>
console.log(anchor.href); // 'https://example.com#foo:~:bar'
console.log(anchor.hash); // '#foo:~:bar'
</script>
```
</div>
<div class="example">
History pushState will create a session history entry where the URL's
fragment directive isn't stripped. However, traversing to the entry will
cause it to set its URL on the document which will process the fragment
directive before setting it on the Document (but the fragment directive
remains on the entry).
```
history.pushState({}, 'title', 'index.html#foo:~:bar');
window.location = 'newpage.html';
// on newpage.html
history.back();
```
Results in the current document having "bar" as the fragment directive.
</div>
### Parsing the fragment directive ### {#parsing-the-fragment-directive}
A <dfn>ParsedTextDirective</dfn> is a <a spec=infra>struct</a> that consists of
four strings: <dfn for="ParsedTextDirective">textStart</dfn>,
<dfn for="ParsedTextDirective">textEnd</dfn>,
<dfn for="ParsedTextDirective">prefix</dfn>, and
<dfn for="ParsedTextDirective">suffix</dfn>. [=ParsedTextDirective/textStart=]
is required to be non-null. The other three items may be set to null,
indicating they weren't provided. The empty string is not a valid value for any
of these items.
See [[#syntax]] for the what each of these components means and how they're
used.
<div algorithm="parse a text directive">
To <dfn>parse a text directive</dfn>, on an <a spec="infra">ASCII string</a> |text
directive input|, run these steps:
<div class="note">
<p>
This algorithm takes a single text directive string as input (e.g.
"text=prefix-,foo,bar") and attempts to parse the string into the
components of the directive (e.g. ("prefix", "foo", "bar", null)). See
[[#syntax]] for the what each of these components means and how they're
used.
</p>
<p>
Returns null if the input is invalid or fails to parse in any way.
Otherwise, returns a [=ParsedTextDirective=].
</p>
</div>
<ol class="algorithm">
1. [=/Assert=]: |text directive input| matches the production [=TextDirective=].
1. Let |textDirectiveString| be the substring of |text directive
input| starting at index 5.
<div class="note">
This is the remainder of the |text directive input| following,
but not including, the "text=" prefix.
</div>
1. Let |tokens| be a <a for=/>list</a> of strings that is the result of
<a lt="split on commas">splitting |textDirectiveString| on commas</a>.
1. If |tokens| has size less than 1 or greater than 4, return null.
1. If any of |tokens|'s items are the empty string, return null.
1. Let |retVal| be a [=ParsedTextDirective=] with each of its items initialized
to null.
1. Let |potential prefix| be the first item of |tokens|.
1. If the last character of |potential prefix| is U+002D (-), then:
1. Set |retVal|'s [=ParsedTextDirective/prefix=] to the
[=string/percent-decode|percent-decoding=] of the result of removing the
last character from |potential prefix|.
1. <a spec=infra for=list>Remove</a> the first item of the list |tokens|.
1. Let |potential suffix| be the last item of |tokens|, if one exists, null
otherwise.
1. If |potential suffix| is non-null and its first character is U+002D (-),
then:
1. Set |retVal|'s [=ParsedTextDirective/suffix=] to the
[=string/percent-decode|percent-decoding=] of the result of removing the
first character from |potential suffix|.
1. <a spec=infra for=list>Remove</a> the last item of the list |tokens|.
1. If |tokens| has <a spec=infra for=list>size</a> not equal to 1 nor 2 then
return null.
1. Set |retVal|'s [=ParsedTextDirective/textStart=] be the
[=string/percent-decode|percent-decoding=] of the first item of |tokens|.
1. If |tokens| has <a spec=infra for=list>size</a> 2, then set |retVal|'s
[=ParsedTextDirective/textEnd=] be the
[=string/percent-decode|percent-decoding=] of the last item of |tokens|.
1. Return |retVal|.
</ol>
</div>
### Fragment directive grammar ### {#fragment-directive-grammar}
A <dfn>valid fragment directive</dfn> is a sequence of characters that appears
in the [=fragment directive=] that matches the production:
<dl>
<dt>
<dfn id="fragmentdirectiveproduction">`FragmentDirective`</dfn> `::=`
</dt>
<dd>
<code>([=TextDirective=] | [=UnknownDirective=]) ("&" [=FragmentDirective=])?</code>
</dd>
<dt>
<dfn>`UnknownDirective`</dfn> `::=`
</dt>
<dd>
<code>[=CharacterString=]</code>
</dd>
<dt>
<dfn>`CharacterString`</dfn> `::=`
</dt>
<dd>
<code>([=ExplicitChar=] | [=PercentEncodedChar=])+</code>
</dd>
<dt>
<dfn>`ExplicitChar`</dfn> `::=`
</dt>
<dd>
<code>[a-zA-Z0-9] | "!" | "$" | "'" | "(" | ")" | "*" | "+" | "." | "/" | ":" |
";" | "=" | "?" | "@" | "_" | "~" | "&" | "," | "-"</code>
<div class = "note">
An [=ExplicitChar=] may be any [=URL code point=].
</div>
</dd>
</dl>
<div class="note">
The [=FragmentDirective=] may contain multiple directives split by the "&"
character. Currently this means we allow multiple text directives to enable
multiple indicated strings in the page, but this also allows for future
directive types to be added and combined. For extensibility, we do not fail to
parse if an unknown directive is in the &-separated list of directives.
</div>
The <dfn>text fragment directive</dfn> is one such [=fragment directive=] that
enables specifying a piece of text on the page, that matches the production:
<dl>
<dt><dfn>`TextDirective`</dfn> `::=`</dt>
<dd><code>"text=" [=TextDirectiveParameters=]</code></dd>
<dt><dfn>`TextDirectiveParameters`</dfn> `::=`</dt>
<dd>
<code>
([=TextDirectivePrefix=] ",")? [=TextDirectiveString=]
("," [=TextDirectiveString=])? ("," [=TextDirectiveSuffix=])?
</code>
</dd>
<dt><dfn>`TextDirectivePrefix`</dfn> `::=`</dt>
<dd><code>[=TextDirectiveString=]"-"</code></dd>
<dt><dfn>`TextDirectiveSuffix`</dfn> `::=`</dt>
<dd><code>"-"[=TextDirectiveString=]</code></dd>
<dt><dfn>`TextDirectiveString`</dfn> `::=`</dt>
<dd><code>([=TextDirectiveExplicitChar=] | [=PercentEncodedChar=])+</code></dd>
<dt><dfn>`TextDirectiveExplicitChar`</dfn> `::=`</dt>
<dd>
<code>
[a-zA-Z0-9] | "!" | "$" | "'" | "(" | ")" | "*" | "+" | "." | "/" | ":" |
";" | "=" | "?" | "@" | "_" | "~"
</code>
<div class = "note">
A [=TextDirectiveExplicitChar=] may be any [=URL code point=] that is not
explicitly used in the [=TextDirective=] syntax, that is "&", "-", and ",",
which must be percent-encoded.
</div>
</dd>
<dt><dfn>`PercentEncodedChar`</dfn> `::=`</dt>
<dd><code>"%" [a-zA-Z0-9]+</code></dd>
</dl>
## Security and Privacy ## {#security-and-privacy}
### Motivation ### {#motivation}
<div class="note">This section is non-normative</div>
Care must be taken when implementing [=text fragment directive=] so that it
cannot be used to exfiltrate information across origins. Scripts can navigate a
page to a cross-origin URL with a [=text fragment directive=]. If a malicious
actor can determine that the text fragment was successfully found in victim
page as a result of such a navigation, they can infer the existence of any text
on the page.
The following subsections restrict the feature to mitigate the expected attack
vectors. In summary, the text fragment directives are invoked only on full
(non-same-page) navigations that are the result of a user activation.
Additionally, navigations originating from a different origin than the
destination will require the navigation to take place in a "noopener" context,
such that the destination page is known to be sufficiently isolated.
### Scroll On Navigation ### {#scroll-on-navigation}
A UA may choose to automatically scroll a matched text passage into view. This
can be a convenient experience for the user but does present some risks that
implementing UAs should be aware of.
There are known (and potentially unknown) ways a scroll on navigation might be
detectable and distinguished from natural user scrolls.
<div class="example">
An origin embedded in an iframe in the target page registers an
IntersectionObserver and determines in the first 500ms of page load whether
a scroll has occurred. This scroll can be indicative of whether the text
fragment was successfully found on the page.
</div>
<div class="example">
Two users share the same network on which traffic is visible between them.
A malicious user sends the victim a link with a text fragment to a
page. The searched-for text appears nearby to a resource located on a unique
(on the page) domain. The attacker may be able to infer the success or failure
of the fragment search based on the order of requests for DNS lookup.
</div>
<div class="example">
A malicious page embeds a cross-origin victim in an iframe. The victim page
contains information sensitive to the user. The malicious page navigates the
victim to a text fragment. Since a successful fragment match will cause
focus, the malicious page can determine if the text appears in the victim by
listening for a blur event in its own document.
</div>
<div class="example">
An attacker sends a link to a victim, sending them to a page that displays
a private token. The attacker asks the victim to read back the token. Using
a text fragment, the attacker gets the page to load for the victim such that
warnings about keeping the token secret are scrolled out of view.
</div>
All known cases like this rely on specific circumstances about the target page
so don't apply generally. With additional restrictions about when the text
fragment can invoke an attacker is further restricted. Nonetheless, different
UAs can come to different conclusions about whether these risks are acceptable.
UAs should consider these factors when determining whether to scroll as part of
navigating to a text fragment.
Conforming UAs may choose not to scroll automatically on navigation. Such UAs
may, instead, provide UI to initiate the scroll ("click to scroll") or none
at all. In these cases UA should provide some indication to the user that an
indicated passage exists further down on the page.
The examples above illustrate that in specific circumstances, it may be
possible for an attacker to extract 1 bit of information about content on the
page. However, care must be taken so that such opportunities cannot be
exploited to extract arbitrary content from the page by repeating the attack.
For this reason, restrictions based on user activation and browsing context
isolation are very important and must be implemented.
<div class="note">
Browsing context isolation ensures that no other document can script the
target document which helps reduce the attack surface.
However, it also ensures any malicious use is difficult to hide. A browsing
context that's the only one in a group must be a top level browsing context
(i.e. a full tab/window).
</div>
If a UA does choose to scroll automatically, it must ensure no scrolling is
performed while the document is in the background (for example, in an inactive
tab). This ensures any malicious usage is visible to the user and prevents
attackers from trying to secretly automate a search in background documents.
### Search Timing ### {#search-timing}
A naive implementation of the text search algorithm could allow information
exfiltration based on runtime duration differences between a matching and non-
matching query. If an attacker could find a way to synchronously navigate
to a [=text fragment directive=]-invoking URL, they would be able to determine
the existence of a text snippet by measuring how long the navigation call takes.
<div class="note">
The restrictions in [[#restricting-the-text-fragment]] should prevent this
specific case; in particular, the no-same-document-navigation restriction.
However, these restrictions are provided as multiple layers of defence.
</div>
For this reason, the implementation <em>must ensure the runtime of
[[#navigating-to-text-fragment]] steps does not differ based on whether a match
has been successfully found</em>.
This specification does not specify exactly how a UA achieves this as there are
multiple solutions with differing tradeoffs. For example, a UA <em>may</em>
continue to walk the tree even after a match is found in [=find a range from a
text directive=]. Alternatively, it <em>may</em> schedule an asynchronous task
to find and set the indicated part of the document.
### Restricting the Text Fragment ### {#restricting-the-text-fragment}
Amend the definition of a [=/request=] and of a [=Document=] to include a new
field for the [=document/textFragmentToken=]:
> <strong>Monkeypatching [[FETCH]]:</strong>
>
> A [=/request=] has an associated <dfn for="request">textFragmentToken</dfn> flag
> <strong>Monkeypatching [[HTML]]:</strong>
>
> A [=Document=] has a <dfn for="document">textFragmentToken</dfn> flag that is
> consumed in order to allow a single activation of a text fragment. This flag is
> generated only during loading if the navigation occurs as a result of a user
> activation.
>
> If the [=Document=]'s [=document/textFragmentToken=] isn't consumed to activate
> a text fragment, it may be consumed to set the [=request/textFragmentToken=]
> flag of a navigation [=/request=]. In this way, a [=document/textFragmentToken=]
> can be propagated from one [=Document=] to another across a navigation.
>
> Reading either the [=Document=]'s [=document/textFragmentToken=] or the
> [=/request=]'s [=request/textFragmentToken=] must always consume the value,
> such that the token cannot be cloned.
<div class="note">
<p>
A [=document/textFragmentToken=] is generated when a [=Document=] is loaded
as a result of a user gesture. It grants its holder permission (in terms of
user activation) to activate a single text fragment. Alternatively, it may be
propagated through a navigation to allow a future document to activate a text
fragment from this navigation's user gesture.
</p>
<p>
This mechanism allows text fragments to activate through a common redirect
technique used by many popular web sites. Such sites redirect users to
their intended destination by responding with a 200 status code containing
script to set the <tt>window.location</tt>.
</p>
<p>
Unlike real HTTP (<tt>status 3xx</tt>) redirects, these "client-side"
redirects cannot propagate the fact that the navigation is the result of a
user gesture. The [=document/textFragmentToken=] mechanism allows passing
through this specifically scoped user-activation through such navigations.
This means a page can programmatically navigate to a text fragment, a
single time, as if it has a user gesture. However, further navigations
require a new user gesture.
</p>
<p>
The following diagram demonstrates how the token is used to activate a text
fragment through a client-side redirect service:
</p>
<img style="margin-left:auto;margin-right:auto;display:block"
src="https://raw.githubusercontent.com/WICG/scroll-to-text-fragment/master/text_fragment_token.png"
alt="Diagram showing how a text fragment token is created and used">
<p>
See [redirects.md](redirects.md) for a more in-depth discussion.
</p>
</div>
> <strong>Monkeypatching [[HTML]]:</strong>
>
> A [=Document=] has an <dfn for="document">allowTextFragmentDirective</dfn>
> flag that is used to determine whether a text fragment directive should be
> allowed to activate. If this flag is false, the text fragment must not
> cause any observable effects.
<div class="note">
<p>
[=document/textFragmentToken=] is analogous to a user-activation state
while [=allowTextFragmentDirective=] is more comprehensive, taking into
account various pieces of information, one of which is the existence of a
textFragmentToken.
</p>
<p>
The reason we compute allowTextFragmentDirective and keep it as a flag,
rather than performing the checks at the time of use, is that it relies on
the properties of the navigation while the invocation will occur as part of
the <a spec=HTML>scroll to the fragment</a> steps which can happen outside
the context of a navigation.
</p>
</div>
<div class="note">
TODO: This should really only prevent potentially observable side-effects like
automatic scrolling. Unobservable effects like a highlight could be safely
allowed in all cases.
</div>
Amend the <a
href="https://html.spec.whatwg.org/multipage/browsing-the-web.html#initialise-the-document-object">create
and initialize a Document object</a> steps by adding the following steps before returning |document|:
> <strong>Monkeypatching [[HTML]]:</strong>
>
> 15. Set the [=document/textFragmentToken=] flag on |document|:
> 1. Let |is user activated| be true if the current navigation was initiated from
> a window that had a <a spec="html">transient activation</a> at the time the
> navigation was initiated, or the UA has reason to believe it comes from a
> direct user gesture (e.g. user typed into the address bar).
> <div class="note">
> TODO: it'd be better to refer to the userActivationFlag on the
> |request|. See
> <a href="https://w3c.github.io/webappsec-fetch-metadata/#request-user-activation-flag">Sec-Fetch-User</a> in [[FETCH-METADATA]].
> </div>
> 1. If <var ignore=''>browsing context</var> is a top-level browsing context and if either of |is
> user activated| or the [=request/textFragmentToken=] flag of
> |navigationParam|'s
> <a href="https://html.spec.whatwg.org/multipage/browsing-the-web.html#navigation-params-request">request</a>
> object is true, set the |document|'s [=document/textFragmentToken=]
> flag to true. Otherwise, set it to false.
> <div class="note">
> It's important that the token not be copyable so that at most one token
> is created per user-activated navigation.
> </div>
> 16. Set the [=document/allowTextFragmentDirective=] flag on |document| by
> following these sub-steps:
> 1. If |document|'s [=fragment directive=] field is null or empty, set
> [=document/allowTextFragmentDirective=] to false and abort these sub-steps.
> 1. Let |textFragmentToken| be the value of |document|'s
> [=document/textFragmentToken=] and set |document|'s
> [=document/textFragmentToken=] to false.
> 1. If the |navigationParam|'s
> <a href="https://html.spec.whatwg.org/multipage/browsing-the-web.html#navigation-params-request">request</a>
> has a <a href="https://w3c.github.io/webappsec-fetch-metadata/#http-headerdef-sec-fetch-site">sec-fetch-site</a>
> header and its value is `"none"` set [=document/allowTextFragmentDirective=] to true and abort these sub-steps.
> <div class="note">
> <p>
> If a navigation originates from browser UI, it's always ok to allow it
> since it'll be user triggered and the page/script isn't providing the
> text snippet.
> </p>
> <p>
> Note: Depending on the UA, there may be cases where the <var
> ignore=''>incumbentNavigationOrigin</var> parameter is null but
> it's not clear that the navigation should be considered as
> initiated from browser UI. E.g. an "open in new window" context
> menu item when right clicking on a link. The intent in this item
> is to distinguish cases where the app/page is able to set the URL
> from those that are fully under the user's control. In the former
> we want to prevent activation of the text fragment unless the
> destination is loaded in a separate browsing context group (so that
> the source cannot both control the text snippet and observe
> side-effects in the navigation).
> </p>
> <p>
> See <a
> href="https://w3c.github.io/webappsec-fetch-metadata/#directly-user-initiated">sec-fetch-site</a>
> for a more detailed discussion of how this should apply.
> </p>
> </div>
> 1. If |textFragmentToken| is false, set
> [=document/allowTextFragmentDirective=] to false and abort these sub-steps.
> 1. If the [=document=] of the <a spec=HTML>latest entry</a> in
> |document|'s [=Document/browsing context=]'s <a spec=HTML>session history</a> is
> equal to |document|, set [=document/allowTextFragmentDirective=] to false
> and abort these sub-steps.
> <div class="note">
> i.e. Forbidden on a same-document navigation.
> </div>
> 1. If the |navigationParam|'s
> <a href="https://html.spec.whatwg.org/multipage/browsing-the-web.html#navigation-params-request">request</a>
> has a <a href="https://w3c.github.io/webappsec-fetch-metadata/#http-headerdef-sec-fetch-site">sec-fetch-site</a>
> header and its value is `"same-origin"` set
> [=document/allowTextFragmentDirective=] to true and abort these
> sub-steps.
> 1. If |document|'s [=Document/browsing context=] is a [=top-level browsing
> context=] and its
> <a href="https://html.spec.whatwg.org/multipage/browsers.html#tlbc-group">group</a>'s
> <a spec=HTML>browsing context set</a> has length 1, set
> [=document/allowTextFragmentDirective=] to true and abort these sub-steps.
> <div class="note">
> i.e. Only allow navigation from a cross-origin element/script if the
> document is loaded in a noopener context. That is, a new top level
> browsing context group to which the navigator does not have script access
> and which may be placed into a separate process.
> </div>
> 1. Otherwise, set [=document/allowTextFragmentDirective=] to false.
Amend step 2 of the
<a href="https://html.spec.whatwg.org/multipage/browsing-the-web.html#process-a-navigate-fetch">
process a navigate fetch</a> steps to additionally set |request|'s
[=request/textFragmentToken=] to the value of the [=active document=]'s
[=document/textFragmentToken=] and set the [=active document=]'s value to
false.
> <strong>Monkeypatching [[HTML]]:</strong>
>
> 2. Set request's client to sourceBrowsingContext's active document's relevant
> settings object, destination to "document", mode to "navigate", credentials
> mode to "include", use-URL-credentials flag, redirect mode to "manual",
> replaces client id to browsingContext's active document's relevant settings
> object's id, and [=request/textFragmentToken=] to
> sourceBrowsingContext's active document's
> [=document/textFragmentToken=]. Set sourceBrowsingContext's active
> document's [=document/textFragmentToken=] to false.
Amend the <a spec=HTML>try to scroll to the fragment</a> steps by replacing the
steps of the task queued in step 2:
> <strong>Monkeypatching [[HTML]]:</strong>
>
> 1. If document has no parser, or its parser has stopped parsing, or the user
> agent has reason to believe the user is no longer interested in scrolling to
> the fragment, then clear <em>document</em>'s
> [=allowTextFragmentDirective=] flag and abort these steps.
> 2. Scroll to the fragment given in document's URL. If this does not find an
> indicated part of the document, then try to scroll to the fragment for
> document.
> 3. Clear <em>document</em>'s [=allowTextFragmentDirective=] flag
## Navigating to a Text Fragment ## {#navigating-to-text-fragment}
<div class="note">
The text fragment specification proposes an amendment to
[[html#scroll-to-fragid]]. In summary, if a [=text fragment directive=] is
present and a match is found in the page, the text fragment takes precedent over
the element fragment as the indicated part of the document. We amend the
indicated part of the document to optionally include a [=range=] that
may be scrolled into view instead of the containing element.
</div>
Replace step 3.1 of the <a spec=HTML>scroll to the fragment</a> algorithm with
the following:
> <strong>Monkeypatching [[HTML]]:</strong>
>
> 1. Let <em>target, range</em> be the [=/element=] and [=range=] that is
> <a spec=HTML>the indicated part of the document</a>.
Replace step 3.3 of the <a spec=HTML>scroll to the fragment</a> algorithm with
the following:
> <strong>Monkeypatching [[HTML]]:</strong>
>
> 3. <a href="https://w3c.github.io/webappsec-permissions-policy/document-policy.html#algo-get-policy-value">Get
> the policy value</a> for `force-load-at-top` in the
> [=Document=]. If the result is true, abort these steps.
> 4. If <em>range</em> is non-null:
> 1. If the UA supports scrolling of text fragments on navigation, invoke
> [=scroll a Range into view|Scroll range into view=], with range
> <em>range</em>, containingElement <em>target</em>, <em>behavior</em> set
> to "auto", <em>block</em> set to "center", and <em>inline</em> set to
> "nearest".
> 5. Otherwise:
> 1. <a spec=cssom-view lt="scroll an element into view">Scroll target
> into view</a>, with <em>behavior</em> set to "auto", <em>block</em>
> set to "start", and <em>inline</em> set to "nearest".
> <div class="note">
> This otherwise case is the same as the current step 3.3.
> </div>
Add the following steps to the beginning of the processing model for
<a spec=HTML>the indicated part of the document</a>:
> <strong>Monkeypatching [[HTML]]:</strong>
>
> 1. Let |fragment directive string| be the document's [=fragment directive=].
> 1. If the document's [=allowTextFragmentDirective=] flag is true then:
> 1. Let |ranges| be a <a spec=infra>list</a> that is the result of running
> the [=process a fragment directive=] steps with |fragment directive
> string| and the document.
> 1. If |ranges| is non-empty, then:
> 1. Let |range| be the first item of |ranges|.
> <div class="note">
> The first [=range=] in |ranges| is specifically
> scrolled into view. This [=range=], along with the
> remaining |ranges| should be visually indicated in a way that
> is not revealed to script, which is left as UA-defined behavior.
> </div>
> 1. Let |node| be the [=first common ancestor=] of |range|'s
> [=range/start node=] and [=range/end node=].
> 1. While |node| is non-null and is not an [=element=], set |node| to
> |node|'s [=tree/parent=].
> 1. The indicated part of the document is |node| and |range|; return.
<div algorithm="first common ancestor">
To find the <dfn>first common ancestor</dfn> of two nodes |nodeA| and |nodeB|,
follow these steps:
<ol class="algorithm">
1. Let |commonAncestor| be |nodeA|.
1. While |commonAncestor| is non-null and is not a [=shadow-including inclusive
ancestor=] of |nodeB|, let |commonAncestor| be |commonAncestor|’s
[=shadow-including parent=].
1. Return |commonAncestor|.
</ol>
</div>
<div algorithm="shadow-including parent">
To find the <dfn>shadow-including parent</dfn> of |node| follow these steps:
<ol class="algorithm">
1. If |node| is a [=/shadow root=], return |node|'s [=DocumentFragment/host=].
1. Otherwise, return |node|'s [=tree/parent=].
</ol>