-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathspec.html
253 lines (253 loc) · 85 KB
/
spec.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
<h1 id="introduction">Introduction</h1>
<p>This specification describes a mechanism for splitting a byte stream into blocks of varying size with split boundaries based solely on the content of the input. It also describes a mechanism for organizing those blocks into a (probabilistically) balanced tree whose shape is likewise determined solely by the content of the input.</p>
<p>The general technique has been used by various systems such as:</p>
<ul>
<li><a href="https://perkeep.org">Perkeep</a></li>
<li><a href="https://bup.github.io/">Bup</a></li>
<li><a href="https://rsync.samba.org/">RSync</a></li>
<li><a href="https://pdos.csail.mit.edu/papers/lbfs:sosp01/lbfs.pdf">Low-Bandwidth Network Filesystem (LBFS)</a></li>
<li><a href="https://syncthing.net/">Syncthing</a></li>
<li><a href="https://github.com/kopia/kopia">Kopia</a></li>
</ul>
<p>...and many others. The technique permits the efficient representation of slightly different versions of the same data (e.g. successive revisions of a file in a version control system), since changes in one part of the input generally do not affect the boundaries of any but the adjacent blocks.</p>
<p>However, the exact functions used by these systems differ in details, and thus do not produce identical splits, making interoperability for some use cases more difficult than it should be.</p>
<p>The role of this specification is therefore to fully and formally describe a concrete function on which future systems may standardize, improving interoperability.</p>
<h1 id="notation">Notation</h1>
<p>This section discusses notation used in this specification.</p>
<p>We define the following sets:</p>
<ul>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>U</mi><mn>32</mn></msub><annotation encoding="application/x-tex">U_{32}</annotation></semantics></math>, The set of integers in the range <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false" form="prefix">[</mo><mn>0</mn><mo>,</mo><msup><mn>2</mn><mn>32</mn></msup><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">[0, 2^{32})</annotation></semantics></math>.</li>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>U</mi><mn>8</mn></msub><annotation encoding="application/x-tex">U_8</annotation></semantics></math>, The set of integers in the range <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false" form="prefix">[</mo><mn>0</mn><mo>,</mo><msup><mn>2</mn><mn>8</mn></msup><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">[0, 2^8)</annotation></semantics></math>, aka bytes.</li>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>V</mi><mn>8</mn></msub><annotation encoding="application/x-tex">V_8</annotation></semantics></math>, The set of <em>sequences</em> of bytes, i.e. sequences of <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>U</mi><mn>8</mn></msub><annotation encoding="application/x-tex">U_8</annotation></semantics></math>.</li>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>V</mi><mi>v</mi></msub><annotation encoding="application/x-tex">V_v</annotation></semantics></math>, The set of <em>sequences</em> of <em>sequences</em> of bytes, i.e. sequences of elements of <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>V</mi><mn>8</mn></msub><annotation encoding="application/x-tex">V_8</annotation></semantics></math>.</li>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>V</mi><mn>32</mn></msub><annotation encoding="application/x-tex">V_{32}</annotation></semantics></math>, The set of sequences of elements of <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>U</mi><mn>32</mn></msub><annotation encoding="application/x-tex">U_{32}</annotation></semantics></math>.</li>
</ul>
<p>All arithmetic operations in this document are implicitly performed modulo <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msup><mn>2</mn><mn>32</mn></msup><annotation encoding="application/x-tex">2^{32}</annotation></semantics></math>. We use standard mathematical notation for addition, subtraction, multiplication, and exponentiation. Division always denotes integer division, i.e. any remainder is dropped.</p>
<p>Numerals staring with the prefix <code>0x</code> are hexadecimal, e.g. <code>0xfe</code> for the (decimal) number 254</p>
<p>We use the notation <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false" form="prefix">⟨</mo><msub><mi>X</mi><mn>0</mn></msub><mo>,</mo><msub><mi>X</mi><mn>1</mn></msub><mo>,</mo><mi>…</mi><mo>,</mo><msub><mi>X</mi><mi>k</mi></msub><mo stretchy="false" form="postfix">⟩</mo></mrow><annotation encoding="application/x-tex">\langle X_0, X_1, \dots, X_k \rangle</annotation></semantics></math> to denote an ordered sequence of values.</p>
<p><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false" form="prefix">|</mo><mi>X</mi><mo stretchy="false" form="prefix">|</mo></mrow><annotation encoding="application/x-tex">|X|</annotation></semantics></math> denotes the length of the sequence <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>X</mi><annotation encoding="application/x-tex">X</annotation></semantics></math>, i.e. the number of elements it contains.</p>
<p>We also use the following operators and functions:</p>
<ul>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi><mo>∧</mo><mi>y</mi></mrow><annotation encoding="application/x-tex">x \wedge y</annotation></semantics></math> denotes the bitwise AND of <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>x</mi><annotation encoding="application/x-tex">x</annotation></semantics></math> and <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>y</mi><annotation encoding="application/x-tex">y</annotation></semantics></math></li>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi><mo>∨</mo><mi>y</mi></mrow><annotation encoding="application/x-tex">x \vee y</annotation></semantics></math> denotes the bitwise <em>inclusive</em> OR of <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>x</mi><annotation encoding="application/x-tex">x</annotation></semantics></math> and <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>y</mi><annotation encoding="application/x-tex">y</annotation></semantics></math></li>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi><mo>⊕</mo><mi>y</mi></mrow><annotation encoding="application/x-tex">x \oplus y</annotation></semantics></math> denotes the bitwise <em>exclusive</em> OR of <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>x</mi><annotation encoding="application/x-tex">x</annotation></semantics></math> and <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>y</mi><annotation encoding="application/x-tex">y</annotation></semantics></math></li>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi><mo>≪</mo><mi>n</mi></mrow><annotation encoding="application/x-tex">x \ll n</annotation></semantics></math> denotes shifting <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>x</mi><annotation encoding="application/x-tex">x</annotation></semantics></math> to the left <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>n</mi><annotation encoding="application/x-tex">n</annotation></semantics></math> bits, i.e. <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi><mo>≪</mo><mi>n</mi><mo>=</mo><mi>x</mi><msup><mn>2</mn><mi>n</mi></msup></mrow><annotation encoding="application/x-tex">x \ll n = x2^{n}</annotation></semantics></math></li>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi><mo>≫</mo><mi>n</mi></mrow><annotation encoding="application/x-tex">x \gg n</annotation></semantics></math> denotes a <em>logical</em> right shift -- it shifts <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>x</mi><annotation encoding="application/x-tex">x</annotation></semantics></math> to the right by <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>n</mi><annotation encoding="application/x-tex">n</annotation></semantics></math> bits, i.e. <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi><mo>≫</mo><mi>n</mi><mo>=</mo><mi>x</mi><mi>/</mi><msup><mn>2</mn><mi>n</mi></msup></mrow><annotation encoding="application/x-tex">x \gg n = x / 2^n</annotation></semantics></math></li>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>X</mi><mo>∥</mo><mi>Y</mi></mrow><annotation encoding="application/x-tex">X \mathbin{\|} Y</annotation></semantics></math> denotes the concatenation of two sequences <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>X</mi><annotation encoding="application/x-tex">X</annotation></semantics></math> and <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>Y</mi><annotation encoding="application/x-tex">Y</annotation></semantics></math>, i.e. if <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>X</mi><mo>=</mo><mo stretchy="false" form="prefix">⟨</mo><msub><mi>X</mi><mn>0</mn></msub><mo>,</mo><mi>…</mi><mo>,</mo><msub><mi>X</mi><mi>N</mi></msub><mo stretchy="false" form="postfix">⟩</mo></mrow><annotation encoding="application/x-tex">X = \langle X_0, \dots, X_N \rangle</annotation></semantics></math> and <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Y</mi><mo>=</mo><mo stretchy="false" form="prefix">⟨</mo><msub><mi>Y</mi><mn>0</mn></msub><mo>,</mo><mi>…</mi><mo>,</mo><msub><mi>Y</mi><mi>M</mi></msub><mo stretchy="false" form="postfix">⟩</mo></mrow><annotation encoding="application/x-tex">Y = \langle Y_0, \dots, Y_M \rangle</annotation></semantics></math> then <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>X</mi><mo>∥</mo><mi>Y</mi><mo>=</mo><mo stretchy="false" form="prefix">⟨</mo><msub><mi>X</mi><mn>0</mn></msub><mo>,</mo><mi>…</mi><mo>,</mo><msub><mi>X</mi><mi>N</mi></msub><mo>,</mo><msub><mi>Y</mi><mn>0</mn></msub><mo>,</mo><mi>…</mi><mo>,</mo><msub><mi>Y</mi><mi>M</mi></msub><mo stretchy="false" form="postfix">⟩</mo></mrow><annotation encoding="application/x-tex">X \mathbin{\|} Y = \langle X_0, \dots, X_N, Y_0, \dots, Y_M \rangle</annotation></semantics></math></li>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>min</mo><mo stretchy="false" form="prefix">(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">\min(x, y)</annotation></semantics></math> denotes the minimum of <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>x</mi><annotation encoding="application/x-tex">x</annotation></semantics></math> and <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>y</mi><annotation encoding="application/x-tex">y</annotation></semantics></math> and <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>max</mo><mo stretchy="false" form="prefix">(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">\max(x, y)</annotation></semantics></math> denotes the maximum</li>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mo>ROT</mo><mi>L</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>x</mi><mo>,</mo><mi>n</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">\operatorname{ROT}_L(x, n)</annotation></semantics></math> denotes the rotation of <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>x</mi><annotation encoding="application/x-tex">x</annotation></semantics></math> to the left by <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>n</mi><annotation encoding="application/x-tex">n</annotation></semantics></math> bits, i.e. <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mo>ROT</mo><mi>L</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>x</mi><mo>,</mo><mi>n</mi><mo stretchy="false" form="postfix">)</mo><mo>=</mo><mo stretchy="false" form="prefix">(</mo><mi>x</mi><mo>≪</mo><mi>n</mi><mo stretchy="false" form="postfix">)</mo><mo>∨</mo><mo stretchy="false" form="prefix">(</mo><mi>x</mi><mo>≫</mo><mo stretchy="false" form="prefix">(</mo><mn>32</mn><mo>−</mo><mi>n</mi><mo stretchy="false" form="postfix">)</mo><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">\operatorname{ROT}_L(x, n) = (x \ll n) \vee (x \gg (32 - n))</annotation></semantics></math></li>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>Type</mo><mo stretchy="false" form="prefix">(</mo><mi>x</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">\operatorname{Type}(x)</annotation></semantics></math> denotes the type of <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>x</mi><annotation encoding="application/x-tex">x</annotation></semantics></math>.</li>
</ul>
<p>We use standard mathematical notation for summation. For example:</p>
<p><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msubsup><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mi>n</mi></msubsup><mi>i</mi></mrow><annotation encoding="application/x-tex">\sum_{i = 0}^{n} i</annotation></semantics></math></p>
<p>denotes the sum of integers in the range <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false" form="prefix">[</mo><mn>0</mn><mo>,</mo><mi>n</mi><mo stretchy="false" form="postfix">]</mo></mrow><annotation encoding="application/x-tex">[0, n]</annotation></semantics></math>.</p>
<p>We define a similar notation for exclusive or:</p>
<p><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msubsup><mo>⨁</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mi>n</mi></msubsup><mi>i</mi></mrow><annotation encoding="application/x-tex">\bigoplus_{i = 0}^{n} i</annotation></semantics></math></p>
<p>denotes the bitwise exclusive or of the integers in <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false" form="prefix">[</mo><mn>0</mn><mo>,</mo><mi>n</mi><mo stretchy="false" form="postfix">]</mo></mrow><annotation encoding="application/x-tex">[0, n]</annotation></semantics></math>, i.e.</p>
<p><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msubsup><mo>⨁</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mi>n</mi></msubsup><mi>i</mi><mo>=</mo><mn>0</mn><mo>⊕</mo><mn>1</mn><mo>⊕</mo><mi>…</mi><mo>⊕</mo><mi>n</mi></mrow><annotation encoding="application/x-tex">\bigoplus_{i = 0}^{n} i = 0 \oplus 1 \oplus \dots \oplus n</annotation></semantics></math></p>
<p>Finally, we define the “prefix” <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mstyle mathvariant="double-struck"><mi>ℙ</mi></mstyle><mi>q</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">\mathbb{P}_q(X)</annotation></semantics></math> of a non-empty sequence <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>X</mi><annotation encoding="application/x-tex">X</annotation></semantics></math> with respect to a given predicate <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>q</mi><annotation encoding="application/x-tex">q</annotation></semantics></math> to be the initial subsequence <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msup><mi>X</mi><mi>′</mi></msup><annotation encoding="application/x-tex">X^\prime</annotation></semantics></math> of <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>X</mi><annotation encoding="application/x-tex">X</annotation></semantics></math> up to and including the first member that makes <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>q</mi><mo stretchy="false" form="prefix">(</mo><msup><mi>X</mi><mi>′</mi></msup><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">q(X^\prime)</annotation></semantics></math> true. And we define the “remainder” <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mstyle mathvariant="double-struck"><mi>ℝ</mi></mstyle><mi>q</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">\mathbb{R}_q(X)</annotation></semantics></math> to be everything left after removing the prefix.</p>
<p>Formally, given a sequence <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>X</mi><mo>=</mo><mo stretchy="false" form="prefix">⟨</mo><msub><mi>X</mi><mn>0</mn></msub><mo>,</mo><mi>…</mi><mo>,</mo><msub><mi>X</mi><mrow><mo stretchy="false" form="prefix">|</mo><mi>X</mi><mo stretchy="false" form="prefix">|</mo><mo>−</mo><mn>1</mn></mrow></msub><mo stretchy="false" form="postfix">⟩</mo></mrow><annotation encoding="application/x-tex">X = \langle X_0, \dots, X_{|X|-1} \rangle</annotation></semantics></math> and a predicate <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>q</mi><mo>∈</mo><mo>Type</mo><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo><mo>→</mo><mo stretchy="false" form="prefix">{</mo><mtext mathvariant="normal">true</mtext><mo>,</mo><mtext mathvariant="normal">false</mtext><mo stretchy="false" form="postfix">}</mo></mrow><annotation encoding="application/x-tex">q \in \operatorname{Type}(X) \rightarrow \{\text{true},\text{false}\}</annotation></semantics></math>,</p>
<p><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mstyle mathvariant="double-struck"><mi>ℙ</mi></mstyle><mi>q</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo><mo>=</mo><mo stretchy="false" form="prefix">⟨</mo><msub><mi>X</mi><mn>0</mn></msub><mo>,</mo><mi>…</mi><mo>,</mo><msub><mi>X</mi><mi>e</mi></msub><mo stretchy="false" form="postfix">⟩</mo></mrow><annotation encoding="application/x-tex">\mathbb{P}_q(X) = \langle X_0, \dots, X_e \rangle</annotation></semantics></math></p>
<p>for the smallest integer <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>e</mi><annotation encoding="application/x-tex">e</annotation></semantics></math> such that:</p>
<ul>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>0</mn><mo>≤</mo><mi>e</mi><mo><</mo><mo stretchy="false" form="prefix">|</mo><mi>X</mi><mo stretchy="false" form="prefix">|</mo></mrow><annotation encoding="application/x-tex">0 \le e < |X|</annotation></semantics></math> and</li>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>q</mi><mo stretchy="false" form="prefix">(</mo><mo stretchy="false" form="prefix">⟨</mo><msub><mi>X</mi><mn>0</mn></msub><mo>,</mo><mi>…</mi><mo>,</mo><msub><mi>X</mi><mi>e</mi></msub><mo stretchy="false" form="postfix">⟩</mo><mo stretchy="false" form="postfix">)</mo><mo>=</mo><mtext mathvariant="normal">true</mtext></mrow><annotation encoding="application/x-tex">q(\langle X_0, \dots, X_e \rangle) = \text{true}</annotation></semantics></math></li>
</ul>
<p>or <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false" form="prefix">|</mo><mi>X</mi><mo stretchy="false" form="prefix">|</mo><mo>−</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">|X|-1</annotation></semantics></math> if no such integer exists. (I.e., if nothing satisfies <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>q</mi><annotation encoding="application/x-tex">q</annotation></semantics></math>, the prefix is the whole sequence.) And:</p>
<p><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mstyle mathvariant="double-struck"><mi>ℝ</mi></mstyle><mi>q</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo><mo>=</mo><mo stretchy="false" form="prefix">⟨</mo><msub><mi>X</mi><mi>b</mi></msub><mo>,</mo><mi>…</mi><mo>,</mo><msub><mi>X</mi><mrow><mo stretchy="false" form="prefix">|</mo><mi>X</mi><mo stretchy="false" form="prefix">|</mo><mo>−</mo><mn>1</mn></mrow></msub><mo stretchy="false" form="postfix">⟩</mo></mrow><annotation encoding="application/x-tex">\mathbb{R}_q(X) = \langle X_b, \dots, X_{|X|-1} \rangle</annotation></semantics></math></p>
<p>where <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>b</mi><mo>=</mo><mo stretchy="false" form="prefix">|</mo><msub><mstyle mathvariant="double-struck"><mi>ℙ</mi></mstyle><mi>q</mi></msub><mo stretchy="false" form="prefix">(</mo><mo stretchy="false" form="prefix">⟨</mo><msub><mi>X</mi><mn>0</mn></msub><mo>,</mo><mi>…</mi><mo>,</mo><msub><mi>X</mi><mrow><mo stretchy="false" form="prefix">|</mo><mi>X</mi><mo stretchy="false" form="prefix">|</mo><mo>−</mo><mn>1</mn></mrow></msub><mo stretchy="false" form="postfix">⟩</mo><mo stretchy="false" form="postfix">)</mo><mo stretchy="false" form="prefix">|</mo></mrow><annotation encoding="application/x-tex">b = |\mathbb{P}_q(\langle X_0, \dots, X_{|X|-1} \rangle)|</annotation></semantics></math>.</p>
<p>Note that when <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mstyle mathvariant="double-struck"><mi>ℙ</mi></mstyle><mi>q</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo><mo>=</mo><mi>X</mi></mrow><annotation encoding="application/x-tex">\mathbb{P}_q(X) = X</annotation></semantics></math>, <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mstyle mathvariant="double-struck"><mi>ℝ</mi></mstyle><mi>q</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo><mo>=</mo><mo stretchy="false" form="prefix">⟨</mo><mo stretchy="false" form="postfix">⟩</mo></mrow><annotation encoding="application/x-tex">\mathbb{R}_q(X) = \langle \rangle</annotation></semantics></math>.</p>
<h1 id="splitting">Splitting</h1>
<p>The primary result of this specification is to define a family of functions:</p>
<p><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mo>SPLIT</mo><mi>C</mi></msub><mo>∈</mo><msub><mi>V</mi><mn>8</mn></msub><mo>→</mo><msub><mi>V</mi><mi>v</mi></msub></mrow><annotation encoding="application/x-tex">\operatorname{SPLIT}_C \in V_8 \rightarrow V_v</annotation></semantics></math></p>
<p>...which is parameterized by a <em>configuration</em> <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>C</mi><annotation encoding="application/x-tex">C</annotation></semantics></math>, consisting of:</p>
<ul>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>S</mi><mtext mathvariant="normal">min</mtext></msub><mo>∈</mo><msub><mi>U</mi><mn>32</mn></msub></mrow><annotation encoding="application/x-tex">S_{\text{min}} \in U_{32}</annotation></semantics></math>, the minimum split size</li>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>S</mi><mtext mathvariant="normal">max</mtext></msub><mo>∈</mo><msub><mi>U</mi><mn>32</mn></msub></mrow><annotation encoding="application/x-tex">S_{\text{max}} \in U_{32}</annotation></semantics></math>, the maximum split size</li>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>H</mi><mo>∈</mo><msub><mi>V</mi><mn>8</mn></msub><mo>→</mo><msub><mi>U</mi><mn>32</mn></msub></mrow><annotation encoding="application/x-tex">H \in V_8 \rightarrow U_{32}</annotation></semantics></math>, the hash function</li>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>T</mi><mo>∈</mo><msub><mi>U</mi><mn>32</mn></msub></mrow><annotation encoding="application/x-tex">T \in U_{32}</annotation></semantics></math>, the threshold</li>
</ul>
<p>The configuration must satisfy <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>S</mi><mtext mathvariant="normal">max</mtext></msub><mo>≥</mo><msub><mi>S</mi><mtext mathvariant="normal">min</mtext></msub><mo>></mo><mn>0</mn></mrow><annotation encoding="application/x-tex">S_{\text{max}} \ge S_{\text{min}} > 0</annotation></semantics></math>.</p>
<h2 id="definitions">Definitions</h2>
<p>We define the constant <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>W</mi><annotation encoding="application/x-tex">W</annotation></semantics></math>, which we call the "window size," to be 64.</p>
<p>We define the predicate <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>q</mi><mi>C</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">q_C(X)</annotation></semantics></math> on a non-empty byte sequence <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>X</mi><annotation encoding="application/x-tex">X</annotation></semantics></math> with respect to a configuration <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>C</mi><annotation encoding="application/x-tex">C</annotation></semantics></math> to be:</p>
<ul>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mtext mathvariant="normal">true</mtext><annotation encoding="application/x-tex">\text{true}</annotation></semantics></math> if <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false" form="prefix">|</mo><mi>X</mi><mo stretchy="false" form="prefix">|</mo><mo>=</mo><msub><mi>S</mi><mtext mathvariant="normal">max</mtext></msub></mrow><annotation encoding="application/x-tex">|X| = S_{\text{max}}</annotation></semantics></math>; otherwise</li>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mtext mathvariant="normal">true</mtext><annotation encoding="application/x-tex">\text{true}</annotation></semantics></math> if <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false" form="prefix">|</mo><mi>X</mi><mo stretchy="false" form="prefix">|</mo><mo>≥</mo><msub><mi>S</mi><mtext mathvariant="normal">min</mtext></msub></mrow><annotation encoding="application/x-tex">|X| \ge S_{\text{min}}</annotation></semantics></math> and <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>H</mi><mo stretchy="false" form="prefix">(</mo><mo stretchy="false" form="prefix">⟨</mo><msub><mi>X</mi><mrow><mo>max</mo><mo stretchy="false" form="prefix">(</mo><mn>0</mn><mo>,</mo><mo stretchy="false" form="prefix">|</mo><mi>X</mi><mo stretchy="false" form="prefix">|</mo><mo>−</mo><mi>W</mi><mo stretchy="false" form="postfix">)</mo></mrow></msub><mo>,</mo><mi>…</mi><mo>,</mo><msub><mi>X</mi><mrow><mo stretchy="false" form="prefix">|</mo><mi>X</mi><mo stretchy="false" form="prefix">|</mo><mo>−</mo><mn>1</mn></mrow></msub><mo stretchy="false" form="postfix">⟩</mo><mo stretchy="false" form="postfix">)</mo><mo>mod</mo><msup><mn>2</mn><mi>T</mi></msup><mo>=</mo><mn>0</mn></mrow><annotation encoding="application/x-tex">H(\langle X_{\max(0,|X|-W)}, \dots, X_{|X|-1} \rangle) \mod 2^T = 0</annotation></semantics></math> (i.e., the last <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>W</mi><annotation encoding="application/x-tex">W</annotation></semantics></math> bytes of <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>X</mi><annotation encoding="application/x-tex">X</annotation></semantics></math> hash to a value with at least <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>T</mi><annotation encoding="application/x-tex">T</annotation></semantics></math> trailing zeroes); otherwise</li>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mtext mathvariant="normal">false</mtext><annotation encoding="application/x-tex">\text{false}</annotation></semantics></math>.</li>
</ul>
<p>We define <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mo>SPLIT</mo><mi>C</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">\operatorname{SPLIT}_C(X)</annotation></semantics></math> recursively, as follows:</p>
<ul>
<li>If <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false" form="prefix">|</mo><mi>X</mi><mo stretchy="false" form="prefix">|</mo><mo>=</mo><mn>0</mn></mrow><annotation encoding="application/x-tex">|X| = 0</annotation></semantics></math>, <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mo>SPLIT</mo><mi>C</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo><mo>=</mo><mo stretchy="false" form="prefix">⟨</mo><mo stretchy="false" form="postfix">⟩</mo></mrow><annotation encoding="application/x-tex">\operatorname{SPLIT}_C(X) = \langle \rangle</annotation></semantics></math></li>
<li>Otherwise, <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mo>SPLIT</mo><mi>C</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo><mo>=</mo><mo stretchy="false" form="prefix">⟨</mo><msub><mstyle mathvariant="double-struck"><mi>ℙ</mi></mstyle><msub><mi>q</mi><mi>C</mi></msub></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo><mo stretchy="false" form="postfix">⟩</mo><mo>∥</mo><msub><mo>SPLIT</mo><mi>C</mi></msub><mo stretchy="false" form="prefix">(</mo><msub><mstyle mathvariant="double-struck"><mi>ℝ</mi></mstyle><msub><mi>q</mi><mi>C</mi></msub></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">\operatorname{SPLIT}_C(X) = \langle \mathbb{P}_{q_C}(X) \rangle \mathbin{\|} \operatorname{SPLIT}_C(\mathbb{R}_{q_C}(X))</annotation></semantics></math></li>
</ul>
<h1 id="tree-construction">Tree Construction</h1>
<p>If sequence <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>X</mi><annotation encoding="application/x-tex">X</annotation></semantics></math> and sequence <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>Y</mi><annotation encoding="application/x-tex">Y</annotation></semantics></math> are largely the same, <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mo>SPLIT</mo><mi>C</mi></msub><annotation encoding="application/x-tex">\operatorname{SPLIT}_C</annotation></semantics></math> will produce mostly the same chunks, choosing the same locations for chunk boundaries except in the vicinity of whatever differences there are between <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>X</mi><annotation encoding="application/x-tex">X</annotation></semantics></math> and <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>Y</mi><annotation encoding="application/x-tex">Y</annotation></semantics></math>.</p>
<p>This has obvious benefits for storage and bandwidth, as the same chunks can represent both <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>X</mi><annotation encoding="application/x-tex">X</annotation></semantics></math> and <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>Y</mi><annotation encoding="application/x-tex">Y</annotation></semantics></math> with few exceptions. But while only a small number of chunks may change, the <em>sequence</em> of chunks may get totally rewritten, as when a difference exists near the beginning of <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>X</mi><annotation encoding="application/x-tex">X</annotation></semantics></math> and <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>Y</mi><annotation encoding="application/x-tex">Y</annotation></semantics></math> and all subsequent chunks have to “shift position” to the left or right. Representing the two different sequences may therefore require space that is linear in the size of <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>X</mi><annotation encoding="application/x-tex">X</annotation></semantics></math> and <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>Y</mi><annotation encoding="application/x-tex">Y</annotation></semantics></math>.</p>
<p>We can do better, requiring space that is only <em>logarithmic</em> in the size of <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>X</mi><annotation encoding="application/x-tex">X</annotation></semantics></math> and <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>Y</mi><annotation encoding="application/x-tex">Y</annotation></semantics></math>, by organizing the chunks in a tree whose shape, like the chunk boundaries themselves, is determined by the content of the input. The trees representing two slightly different versions of the same input will differ only in the subtrees in the vicinity of the differences.</p>
<h2 id="definitions-1">Definitions</h2>
<p>A “chunk” is a member of the sequence produced by <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mo>SPLIT</mo><mi>C</mi></msub><annotation encoding="application/x-tex">\operatorname{SPLIT}_C</annotation></semantics></math>.</p>
<p>The “hashval” <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>V</mi><mi>C</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">V_C(X)</annotation></semantics></math> of a byte sequence <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>X</mi><annotation encoding="application/x-tex">X</annotation></semantics></math> is:</p>
<p><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>H</mi><mo stretchy="false" form="prefix">(</mo><mo stretchy="false" form="prefix">⟨</mo><msub><mi>X</mi><mrow><mo>max</mo><mo stretchy="false" form="prefix">(</mo><mn>0</mn><mo>,</mo><mo stretchy="false" form="prefix">|</mo><mi>X</mi><mo stretchy="false" form="prefix">|</mo><mo>−</mo><mi>W</mi><mo stretchy="false" form="postfix">)</mo></mrow></msub><mo>,</mo><mi>…</mi><mo>,</mo><msub><mi>X</mi><mrow><mo stretchy="false" form="prefix">|</mo><mi>X</mi><mo stretchy="false" form="prefix">|</mo><mo>−</mo><mn>1</mn></mrow></msub><mo stretchy="false" form="postfix">⟩</mo><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">H(\langle X_{\max(0, |X|-W)}, \dots, X_{|X|-1} \rangle)</annotation></semantics></math></p>
<p>(i.e., the hash of the last <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>W</mi><annotation encoding="application/x-tex">W</annotation></semantics></math> bytes of <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>X</mi><annotation encoding="application/x-tex">X</annotation></semantics></math>).</p>
<p>A “node” <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>N</mi><mrow><mi>h</mi><mo>,</mo><mi>i</mi></mrow></msub><annotation encoding="application/x-tex">N_{h,i}</annotation></semantics></math> in a hashsplit tree at non-negative “height” <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>h</mi><annotation encoding="application/x-tex">h</annotation></semantics></math> is a sequence of children. The children of a node at height 0 are chunks. The children of a node at height <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>h</mi><mo>+</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">h+1</annotation></semantics></math> are nodes at height <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>h</mi><annotation encoding="application/x-tex">h</annotation></semantics></math>.</p>
<p>A “tier” of a hashsplit tree is a sequence of nodes <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>N</mi><mi>h</mi></msub><mo>=</mo><mo stretchy="false" form="prefix">⟨</mo><msub><mi>N</mi><mrow><mi>h</mi><mo>,</mo><mn>0</mn></mrow></msub><mo>,</mo><mi>…</mi><mo>,</mo><msub><mi>N</mi><mrow><mi>h</mi><mo>,</mo><mi>k</mi></mrow></msub><mo stretchy="false" form="postfix">⟩</mo></mrow><annotation encoding="application/x-tex">N_h = \langle N_{h,0}, \dots, N_{h,k} \rangle</annotation></semantics></math> at a given height <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>h</mi><annotation encoding="application/x-tex">h</annotation></semantics></math>.</p>
<p>The function <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>Rightmost</mo><mo stretchy="false" form="prefix">(</mo><msub><mi>N</mi><mrow><mi>h</mi><mo>,</mo><mi>i</mi></mrow></msub><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">\operatorname{Rightmost}(N_{h,i})</annotation></semantics></math> on a node <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>N</mi><mrow><mi>h</mi><mo>,</mo><mi>i</mi></mrow></msub><mo>=</mo><mo stretchy="false" form="prefix">⟨</mo><msub><mi>S</mi><mn>0</mn></msub><mo>,</mo><mi>…</mi><mo>,</mo><msub><mi>S</mi><mi>e</mi></msub><mo stretchy="false" form="postfix">⟩</mo></mrow><annotation encoding="application/x-tex">N_{h,i} = \langle S_0, \dots, S_e \rangle</annotation></semantics></math> produces the “rightmost leaf chunk” defined recursively as follows:</p>
<ul>
<li>If <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>h</mi><mo>=</mo><mn>0</mn></mrow><annotation encoding="application/x-tex">h = 0</annotation></semantics></math>, <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>Rightmost</mo><mo stretchy="false" form="prefix">(</mo><msub><mi>N</mi><mrow><mi>h</mi><mo>,</mo><mi>i</mi></mrow></msub><mo stretchy="false" form="postfix">)</mo><mo>=</mo><msub><mi>S</mi><mi>e</mi></msub></mrow><annotation encoding="application/x-tex">\operatorname{Rightmost}(N_{h,i}) = S_e</annotation></semantics></math></li>
<li>If <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>h</mi><mo>></mo><mn>0</mn></mrow><annotation encoding="application/x-tex">h > 0</annotation></semantics></math>, <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>Rightmost</mo><mo stretchy="false" form="prefix">(</mo><msub><mi>N</mi><mrow><mi>h</mi><mo>,</mo><mi>i</mi></mrow></msub><mo stretchy="false" form="postfix">)</mo><mo>=</mo><mo>Rightmost</mo><mo stretchy="false" form="prefix">(</mo><msub><mi>S</mi><mi>e</mi></msub><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">\operatorname{Rightmost}(N_{h,i}) = \operatorname{Rightmost}(S_e)</annotation></semantics></math></li>
</ul>
<p>The “level” <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>L</mi><mi>C</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">L_C(X)</annotation></semantics></math> of a given chunk <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>X</mi><annotation encoding="application/x-tex">X</annotation></semantics></math> is <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>max</mo><mo stretchy="false" form="prefix">(</mo><mn>0</mn><mo>,</mo><mi>Q</mi><mo>−</mo><mi>T</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">\max(0, Q - T)</annotation></semantics></math>, where <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>Q</mi><annotation encoding="application/x-tex">Q</annotation></semantics></math> is the largest integer such that</p>
<ul>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Q</mi><mo>≤</mo><mn>32</mn></mrow><annotation encoding="application/x-tex">Q \le 32</annotation></semantics></math> and</li>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>V</mi><mi>C</mi></msub><mo stretchy="false" form="prefix">(</mo><msub><mstyle mathvariant="double-struck"><mi>ℙ</mi></mstyle><msub><mi>q</mi><mi>C</mi></msub></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo><mo stretchy="false" form="postfix">)</mo><mo>mod</mo><msup><mn>2</mn><mi>Q</mi></msup><mo>=</mo><mn>0</mn></mrow><annotation encoding="application/x-tex">V_C(\mathbb{P}_{q_C}(X)) \mod 2^Q = 0</annotation></semantics></math></li>
</ul>
<p>(i.e., the level is the number of trailing zeroes in the hashval in excess of the threshold needed to produce the prefix chunk <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mstyle mathvariant="double-struck"><mi>ℙ</mi></mstyle><msub><mi>q</mi><mi>C</mi></msub></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">\mathbb{P}_{q_C}(X)</annotation></semantics></math>).</p>
<p>The level <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>L</mi><mi>C</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>N</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">L_C(N)</annotation></semantics></math> of a given <em>node</em> <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>N</mi><annotation encoding="application/x-tex">N</annotation></semantics></math> is the level of its rightmost leaf chunk: <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>L</mi><mi>C</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>N</mi><mo stretchy="false" form="postfix">)</mo><mo>=</mo><msub><mi>L</mi><mi>C</mi></msub><mo stretchy="false" form="prefix">(</mo><mo>Rightmost</mo><mo stretchy="false" form="prefix">(</mo><mi>N</mi><mo stretchy="false" form="postfix">)</mo><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">L_C(N) = L_C(\operatorname{Rightmost}(N))</annotation></semantics></math></p>
<p>The predicate <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>z</mi><mrow><mi>C</mi><mo>,</mo><mi>h</mi></mrow></msub><mo stretchy="false" form="prefix">(</mo><mi>K</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">z_{C,h}(K)</annotation></semantics></math> on a sequence <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>K</mi><mo>=</mo><mo stretchy="false" form="prefix">⟨</mo><msub><mi>K</mi><mn>0</mn></msub><mo>,</mo><mi>…</mi><mo>,</mo><msub><mi>K</mi><mi>e</mi></msub><mo stretchy="false" form="postfix">⟩</mo></mrow><annotation encoding="application/x-tex">K = \langle K_0, \dots, K_e \rangle</annotation></semantics></math> of chunks or of nodes with respect to a height <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>h</mi><annotation encoding="application/x-tex">h</annotation></semantics></math> is defined as:</p>
<ul>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mtext mathvariant="normal">true</mtext><annotation encoding="application/x-tex">\text{true}</annotation></semantics></math> if <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>L</mi><mi>C</mi></msub><mo stretchy="false" form="prefix">(</mo><msub><mi>K</mi><mi>e</mi></msub><mo stretchy="false" form="postfix">)</mo><mo>></mo><mi>h</mi></mrow><annotation encoding="application/x-tex">L_C(K_e) > h</annotation></semantics></math>; otherwise</li>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mtext mathvariant="normal">false</mtext><annotation encoding="application/x-tex">\text{false}</annotation></semantics></math>.</li>
</ul>
<p>For conciseness, define</p>
<ul>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>P</mi><mi>C</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo><mo>=</mo><msub><mstyle mathvariant="double-struck"><mi>ℙ</mi></mstyle><msub><mi>z</mi><mrow><mi>C</mi><mo>,</mo><mn>0</mn></mrow></msub></msub><mo stretchy="false" form="prefix">(</mo><msub><mo>SPLIT</mo><mi>C</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">P_C(X) = \mathbb{P}_{z_{C,0}}(\operatorname{SPLIT}_C(X))</annotation></semantics></math> and</li>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>R</mi><mi>C</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo><mo>=</mo><msub><mstyle mathvariant="double-struck"><mi>ℝ</mi></mstyle><msub><mi>z</mi><mrow><mi>C</mi><mo>,</mo><mn>0</mn></mrow></msub></msub><mo stretchy="false" form="prefix">(</mo><msub><mo>SPLIT</mo><mi>C</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">R_C(X) = \mathbb{R}_{z_{C,0}}(\operatorname{SPLIT}_C(X))</annotation></semantics></math></li>
</ul>
<h2 id="algorithm">Algorithm</h2>
<p>This section contains two descriptions of hashsplit trees: an algebraic description for formal reasoning, and a procedural description for practical construction.</p>
<h3 id="algebraic-description">Algebraic description</h3>
<p>The tier <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>N</mi><mn>0</mn></msub><annotation encoding="application/x-tex">N_0</annotation></semantics></math> of hashsplit tree nodes for a given byte sequence <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>X</mi><annotation encoding="application/x-tex">X</annotation></semantics></math> is equal to</p>
<p><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false" form="prefix">⟨</mo><msub><mi>P</mi><mi>C</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo><mo stretchy="false" form="postfix">⟩</mo><mstyle mathvariant="double-struck"><mo stretchy="false" form="prefix">∥</mo></mstyle><msub><mi>R</mi><mi>C</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">\langle P_C(X) \rangle \mathbb{\|} R_C(X)</annotation></semantics></math></p>
<p>The tier <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>N</mi><mrow><mi>h</mi><mo>+</mo><mn>1</mn></mrow></msub><annotation encoding="application/x-tex">N_{h+1}</annotation></semantics></math> of hashsplit tree nodes for a given byte sequence <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>X</mi><annotation encoding="application/x-tex">X</annotation></semantics></math> is equal to</p>
<p><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false" form="prefix">⟨</mo><msub><mstyle mathvariant="double-struck"><mi>ℙ</mi></mstyle><msub><mi>z</mi><mrow><mi>C</mi><mo>,</mo><mi>h</mi><mo>+</mo><mn>1</mn></mrow></msub></msub><mo stretchy="false" form="prefix">(</mo><msub><mi>N</mi><mi>h</mi></msub><mo stretchy="false" form="postfix">)</mo><mo stretchy="false" form="postfix">⟩</mo><mstyle mathvariant="double-struck"><mo stretchy="false" form="prefix">∥</mo></mstyle><msub><mstyle mathvariant="double-struck"><mi>ℝ</mi></mstyle><msub><mi>z</mi><mrow><mi>C</mi><mo>,</mo><mi>h</mi><mo>+</mo><mn>1</mn></mrow></msub></msub><mo stretchy="false" form="prefix">(</mo><msub><mi>N</mi><mi>h</mi></msub><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">\langle \mathbb{P}_{z_{C,h+1}}(N_h) \rangle \mathbb{\|} \mathbb{R}_{z_{C,h+1}}(N_h)</annotation></semantics></math></p>
<p>(I.e., each node in the tree has as its children a sequence of chunks or lower-tier nodes, as appropriate, up to and including the first one whose “level” is greater than the node’s height.)</p>
<p>The root of the hashsplit tree is <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>N</mi><mrow><msup><mi>h</mi><mi>′</mi></msup><mo>,</mo><mn>0</mn></mrow></msub><annotation encoding="application/x-tex">N_{h^\prime,0}</annotation></semantics></math> for the smallest value of <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msup><mi>h</mi><mi>′</mi></msup><annotation encoding="application/x-tex">h^\prime</annotation></semantics></math> such that <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false" form="prefix">|</mo><msub><mi>N</mi><msup><mi>h</mi><mi>′</mi></msup></msub><mo stretchy="false" form="prefix">|</mo><mo>=</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">|N_{h^\prime}| = 1</annotation></semantics></math></p>
<h3 id="procedural-description">Procedural description</h3>
<p>For this description we use <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>N</mi><mi>h</mi></msub><annotation encoding="application/x-tex">N_h</annotation></semantics></math> to denote a single node at height <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>h</mi><annotation encoding="application/x-tex">h</annotation></semantics></math>. The algorithm must keep track of the “rightmost” such node for each tier in the tree.</p>
<p>To compute a hashsplit tree from a byte sequence <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>X</mi><annotation encoding="application/x-tex">X</annotation></semantics></math>, compute its “root node” as follows.</p>
<ol style="list-style-type: decimal">
<li>Let <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>N</mi><mn>0</mn></msub><annotation encoding="application/x-tex">N_0</annotation></semantics></math> be <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false" form="prefix">⟨</mo><mo stretchy="false" form="postfix">⟩</mo></mrow><annotation encoding="application/x-tex">\langle\rangle</annotation></semantics></math> (i.e., a node at height 0 with no children).</li>
<li>If <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false" form="prefix">|</mo><mi>X</mi><mo stretchy="false" form="prefix">|</mo><mo>=</mo><mn>0</mn></mrow><annotation encoding="application/x-tex">|X| = 0</annotation></semantics></math>, then:
<ol style="list-style-type: lower-alpha">
<li>Let <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>h</mi><annotation encoding="application/x-tex">h</annotation></semantics></math> be the largest height such that <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>N</mi><mi>h</mi></msub><annotation encoding="application/x-tex">N_h</annotation></semantics></math> exists.</li>
<li>If <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false" form="prefix">|</mo><msub><mi>N</mi><mn>0</mn></msub><mo stretchy="false" form="prefix">|</mo><mo>></mo><mn>0</mn></mrow><annotation encoding="application/x-tex">|N_0| > 0</annotation></semantics></math>, then:
<ol style="list-style-type: lower-roman">
<li>For each integer <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>i</mi><annotation encoding="application/x-tex">i</annotation></semantics></math> in <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false" form="prefix">[</mo><mn>0</mn><mi>.</mi><mi>.</mi><mi>h</mi><mo stretchy="false" form="postfix">]</mo></mrow><annotation encoding="application/x-tex">[0 .. h]</annotation></semantics></math>, “close” <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>N</mi><mi>i</mi></msub><annotation encoding="application/x-tex">N_i</annotation></semantics></math> (see below).</li>
<li>Set <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>h</mi><mo>←</mo><mi>h</mi><mo>+</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">h \leftarrow h+1</annotation></semantics></math>.</li>
</ol></li>
<li>[pruning] While <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>h</mi><mo>></mo><mn>0</mn></mrow><annotation encoding="application/x-tex">h > 0</annotation></semantics></math> and <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false" form="prefix">|</mo><msub><mi>N</mi><mi>h</mi></msub><mo stretchy="false" form="prefix">|</mo><mo>=</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">|N_h| = 1</annotation></semantics></math>, set <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>h</mi><mo>←</mo><mi>h</mi><mo>−</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">h \leftarrow h-1</annotation></semantics></math> (i.e., traverse from the prospective tree root downward until there is a node with more than one child).</li>
<li><strong>Terminate</strong> with <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>N</mi><mi>h</mi></msub><annotation encoding="application/x-tex">N_h</annotation></semantics></math> as the root node.</li>
</ol></li>
<li>Otherwise, set <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>N</mi><mn>0</mn></msub><mo>←</mo><msub><mi>N</mi><mn>0</mn></msub><mo>∥</mo><mo stretchy="false" form="prefix">⟨</mo><msub><mi>P</mi><mi>C</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo><mo stretchy="false" form="postfix">⟩</mo></mrow><annotation encoding="application/x-tex">N_0 \leftarrow N_0 \mathbin{\|} \langle P_C(X) \rangle</annotation></semantics></math> (i.e., add <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>P</mi><mi>C</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">P_C(X)</annotation></semantics></math> to the list of children in <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>N</mi><mn>0</mn></msub><annotation encoding="application/x-tex">N_0</annotation></semantics></math>).</li>
<li>For each integer <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>i</mi><annotation encoding="application/x-tex">i</annotation></semantics></math> in <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false" form="prefix">[</mo><mn>0</mn><mi>.</mi><mi>.</mi><msub><mi>L</mi><mi>C</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">[0 .. L_C(X))</annotation></semantics></math>, “close” the node <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>N</mi><mi>i</mi></msub><annotation encoding="application/x-tex">N_i</annotation></semantics></math> (see below).</li>
<li>Set <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>X</mi><mo>←</mo><msub><mi>R</mi><mi>C</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">X \leftarrow R_C(X)</annotation></semantics></math>.</li>
<li>Go to step 2.</li>
</ol>
<p>To “close” a node <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>N</mi><mi>i</mi></msub><annotation encoding="application/x-tex">N_i</annotation></semantics></math>:</p>
<ol style="list-style-type: decimal">
<li>If no <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>N</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><annotation encoding="application/x-tex">N_{i+1}</annotation></semantics></math> exists yet, let <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>N</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><annotation encoding="application/x-tex">N_{i+1}</annotation></semantics></math> be <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false" form="prefix">⟨</mo><mo stretchy="false" form="postfix">⟩</mo></mrow><annotation encoding="application/x-tex">\langle\rangle</annotation></semantics></math> (i.e., a node at height <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow><annotation encoding="application/x-tex">{i + 1}</annotation></semantics></math> with no children).</li>
<li>Set <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>N</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>←</mo><msub><mi>N</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>∥</mo><mo stretchy="false" form="prefix">⟨</mo><msub><mi>N</mi><mi>i</mi></msub><mo stretchy="false" form="postfix">⟩</mo></mrow><annotation encoding="application/x-tex">N_{i+1} \leftarrow N_{i+1} \mathbin{\|} \langle N_i \rangle</annotation></semantics></math> (i.e., add <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>N</mi><mi>i</mi></msub><annotation encoding="application/x-tex">N_i</annotation></semantics></math> as a child to <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>N</mi><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><annotation encoding="application/x-tex">N_{i+1}</annotation></semantics></math>).</li>
<li>Let <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>N</mi><mi>i</mi></msub><annotation encoding="application/x-tex">N_i</annotation></semantics></math> be <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false" form="prefix">⟨</mo><mo stretchy="false" form="postfix">⟩</mo></mrow><annotation encoding="application/x-tex">\langle\rangle</annotation></semantics></math> (i.e., new node at height <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>i</mi><annotation encoding="application/x-tex">i</annotation></semantics></math> with no children).</li>
</ol>
<h1 id="rolling-hash-functions">Rolling Hash Functions</h1>
<h2 id="cp32">CP32</h2>
<p>The <code>cp32</code> hash function is based on cyclic polynomials. The family of related functions is sometimes also called "buzhash." <code>cp32</code> is the recommended hash function for use with hashsplit; use it unless you have clear reasons for doing otherwise.</p>
<h3 id="definition">Definition</h3>
<p>We define the function <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>CP32</mo><mo>∈</mo><msub><mi>V</mi><mn>8</mn></msub><mo>→</mo><msub><mi>U</mi><mn>32</mn></msub></mrow><annotation encoding="application/x-tex">\operatorname{CP32} \in V_8 \rightarrow U_{32}</annotation></semantics></math> as:</p>
<p><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>CP32</mo><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo><mo>=</mo><msubsup><mo>⨁</mo><mrow><mi>i</mi><mo>=</mo><mn>0</mn></mrow><mrow><mo stretchy="false" form="prefix">|</mo><mi>X</mi><mo stretchy="false" form="prefix">|</mo><mo>−</mo><mn>1</mn></mrow></msubsup><msub><mo>ROT</mo><mi>L</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>g</mi><mo stretchy="false" form="prefix">(</mo><msub><mi>X</mi><mi>i</mi></msub><mo stretchy="false" form="postfix">)</mo><mo>,</mo><mo stretchy="false" form="prefix">|</mo><mi>X</mi><mo stretchy="false" form="prefix">|</mo><mo>−</mo><mi>i</mi><mo>+</mo><mn>1</mn><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">\operatorname{CP32}(X) = \bigoplus_{i = 0}^{|X| - 1} \operatorname{ROT}_L(g(X_i), |X| - i + 1)</annotation></semantics></math></p>
<p>Where <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>g</mi><mo stretchy="false" form="prefix">(</mo><mi>n</mi><mo stretchy="false" form="postfix">)</mo><mo>=</mo><msub><mi>G</mi><mi>n</mi></msub></mrow><annotation encoding="application/x-tex">g(n) = G_n</annotation></semantics></math> and the sequence <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>G</mi><mo>∈</mo><msub><mi>V</mi><mn>32</mn></msub></mrow><annotation encoding="application/x-tex">G \in V_{32}</annotation></semantics></math> is defined in the appendix.</p>
<p>The sequence <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>G</mi><annotation encoding="application/x-tex">G</annotation></semantics></math> was chosen at random. Note that <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false" form="prefix">|</mo><mi>G</mi><mo stretchy="false" form="prefix">|</mo><mo>=</mo><mn>256</mn></mrow><annotation encoding="application/x-tex">|G| = 256</annotation></semantics></math>, so <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>g</mi><mo stretchy="false" form="prefix">(</mo><mi>n</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">g(n)</annotation></semantics></math> is always defined.</p>
<h3 id="implementation">Implementation</h3>
<h2 id="rolling">Rolling</h2>
<p><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mo>CP32</mo><annotation encoding="application/x-tex">\operatorname{CP32}</annotation></semantics></math> can be computed in a rolling fashion; for sequences</p>
<p><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>X</mi><mo>=</mo><mo stretchy="false" form="prefix">⟨</mo><msub><mi>X</mi><mn>0</mn></msub><mo>,</mo><mi>…</mi><mo>,</mo><msub><mi>X</mi><mi>N</mi></msub><mo stretchy="false" form="postfix">⟩</mo></mrow><annotation encoding="application/x-tex">X = \langle X_0, \dots, X_N \rangle</annotation></semantics></math></p>
<p>and</p>
<p><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>Y</mi><mo>=</mo><mo stretchy="false" form="prefix">⟨</mo><msub><mi>X</mi><mn>1</mn></msub><mo>,</mo><mi>…</mi><mo>,</mo><msub><mi>X</mi><mi>N</mi></msub><mo>,</mo><mi>y</mi><mo stretchy="false" form="postfix">⟩</mo></mrow><annotation encoding="application/x-tex">Y = \langle X_1, \dots, X_N, y \rangle</annotation></semantics></math></p>
<p>Given <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>CP32</mo><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">\operatorname{CP32}(X)</annotation></semantics></math>, <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>X</mi><mn>0</mn></msub><annotation encoding="application/x-tex">X_0</annotation></semantics></math> and <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>y</mi><annotation encoding="application/x-tex">y</annotation></semantics></math>, we can compute <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>CP32</mo><mo stretchy="false" form="prefix">(</mo><mi>Y</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">\operatorname{CP32}(Y)</annotation></semantics></math> as:</p>
<p><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>CP32</mo><mo stretchy="false" form="prefix">(</mo><mi>Y</mi><mo stretchy="false" form="postfix">)</mo><mo>=</mo><msub><mo>ROT</mo><mi>L</mi></msub><mo stretchy="false" form="prefix">(</mo><mo>CP32</mo><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo><mo>,</mo><mn>1</mn><mo stretchy="false" form="postfix">)</mo><mo>⊕</mo><msub><mo>ROT</mo><mi>L</mi></msub><mo stretchy="false" form="prefix">(</mo><mi>g</mi><mo stretchy="false" form="prefix">(</mo><msub><mi>X</mi><mn>0</mn></msub><mo stretchy="false" form="postfix">)</mo><mo>,</mo><mo stretchy="false" form="prefix">|</mo><mi>X</mi><mo stretchy="false" form="prefix">|</mo><mo>mod</mo><mn>32</mn><mo stretchy="false" form="postfix">)</mo><mo>⊕</mo><mi>g</mi><mo stretchy="false" form="prefix">(</mo><mi>y</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">\operatorname{CP32}(Y) = \operatorname{ROT}_L(\operatorname{CP32}(X), 1) \oplus \operatorname{ROT}_L(g(X_0), |X| \mod 32) \oplus g(y)</annotation></semantics></math>.</p>
<p>Note that the splitting algorithm only computes hashes on sequences of size <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>W</mi><mo>=</mo><mn>64</mn></mrow><annotation encoding="application/x-tex">W = 64</annotation></semantics></math>, and since 64 is a multiple of 32 this means that for the purposes of splitting, the above can be simplified to:</p>
<p><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>CP32</mo><mo stretchy="false" form="prefix">(</mo><mi>Y</mi><mo stretchy="false" form="postfix">)</mo><mo>=</mo><msub><mo>ROT</mo><mi>L</mi></msub><mo stretchy="false" form="prefix">(</mo><mo>CP32</mo><mo stretchy="false" form="prefix">(</mo><mi>X</mi><mo stretchy="false" form="postfix">)</mo><mo>,</mo><mn>1</mn><mo stretchy="false" form="postfix">)</mo><mo>⊕</mo><mi>g</mi><mo stretchy="false" form="prefix">(</mo><msub><mi>X</mi><mn>0</mn></msub><mo stretchy="false" form="postfix">)</mo><mo>⊕</mo><mi>g</mi><mo stretchy="false" form="prefix">(</mo><mi>y</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">\operatorname{CP32}(Y) = \operatorname{ROT}_L(\operatorname{CP32}(X), 1) \oplus g(X_0) \oplus g(y)</annotation></semantics></math>.</p>
<h2 id="the-rrs-rolling-checksums">The RRS Rolling Checksums</h2>
<p>The <code>rrs</code> family of checksums is based on an algorithm first used in <a href="https://rsync.samba.org/tech_report/node3.html">rsync</a>, and later adapted for use in <a href="https://bup.github.io/">bup</a> and <a href="https://perkeep.org/">perkeep</a>. <code>rrs</code> was originally inspired by the adler-32 checksum. The name <code>rrs</code> was chosen for this specification, and stands for <code>rsync rolling sum</code>.</p>
<h3 id="definition-1">Definition</h3>
<p>A concrete <code>rrs</code> checksum is defined by the parameters:</p>
<ul>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>M</mi><annotation encoding="application/x-tex">M</annotation></semantics></math>, the modulus</li>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>c</mi><annotation encoding="application/x-tex">c</annotation></semantics></math>, the character offset</li>
</ul>
<p>Given a sequence of bytes <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false" form="prefix">⟨</mo><msub><mi>X</mi><mn>0</mn></msub><mo>,</mo><msub><mi>X</mi><mn>1</mn></msub><mo>,</mo><mi>…</mi><mo>,</mo><msub><mi>X</mi><mi>N</mi></msub><mo stretchy="false" form="postfix">⟩</mo></mrow><annotation encoding="application/x-tex">\langle X_0, X_1, \dots, X_N \rangle</annotation></semantics></math> and a choice of <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>M</mi><annotation encoding="application/x-tex">M</annotation></semantics></math> and <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>c</mi><annotation encoding="application/x-tex">c</annotation></semantics></math>, the <code>rrs</code> hash of the sub-sequence <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false" form="prefix">⟨</mo><msub><mi>X</mi><mi>k</mi></msub><mo>,</mo><mi>…</mi><mo>,</mo><msub><mi>X</mi><mi>l</mi></msub><mo stretchy="false" form="postfix">⟩</mo></mrow><annotation encoding="application/x-tex">\langle X_k, \dots, X_l \rangle</annotation></semantics></math> is <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>s</mi><mo stretchy="false" form="prefix">(</mo><mi>k</mi><mo>,</mo><mi>l</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">s(k, l)</annotation></semantics></math>, where:</p>
<p><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>a</mi><mo stretchy="false" form="prefix">(</mo><mi>k</mi><mo>,</mo><mi>l</mi><mo stretchy="false" form="postfix">)</mo><mo>=</mo><mo stretchy="false" form="prefix">(</mo><msubsup><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mi>k</mi></mrow><mi>l</mi></msubsup><mo stretchy="false" form="prefix">(</mo><msub><mi>X</mi><mi>i</mi></msub><mo>+</mo><mi>c</mi><mo stretchy="false" form="postfix">)</mo><mo stretchy="false" form="postfix">)</mo><mo>mod</mo><mi>M</mi></mrow><annotation encoding="application/x-tex">a(k, l) = (\sum_{i = k}^{l} (X_i + c)) \mod M</annotation></semantics></math></p>
<p><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>b</mi><mo stretchy="false" form="prefix">(</mo><mi>k</mi><mo>,</mo><mi>l</mi><mo stretchy="false" form="postfix">)</mo><mo>=</mo><mo stretchy="false" form="prefix">(</mo><msubsup><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mi>k</mi></mrow><mi>l</mi></msubsup><mo stretchy="false" form="prefix">(</mo><mi>l</mi><mo>−</mo><mi>i</mi><mo>+</mo><mn>1</mn><mo stretchy="false" form="postfix">)</mo><mo stretchy="false" form="prefix">(</mo><msub><mi>X</mi><mi>i</mi></msub><mo>+</mo><mi>c</mi><mo stretchy="false" form="postfix">)</mo><mo stretchy="false" form="postfix">)</mo><mo>mod</mo><mi>M</mi></mrow><annotation encoding="application/x-tex">b(k, l) = (\sum_{i = k}^{l} (l - i + 1)(X_i + c)) \mod M</annotation></semantics></math></p>
<p><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>s</mi><mo stretchy="false" form="prefix">(</mo><mi>k</mi><mo>,</mo><mi>l</mi><mo stretchy="false" form="postfix">)</mo><mo>=</mo><mi>b</mi><mo stretchy="false" form="prefix">(</mo><mi>k</mi><mo>,</mo><mi>l</mi><mo stretchy="false" form="postfix">)</mo><mo>+</mo><msup><mn>2</mn><mn>16</mn></msup><mi>a</mi><mo stretchy="false" form="prefix">(</mo><mi>k</mi><mo>,</mo><mi>l</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">s(k, l) = b(k, l) + 2^{16}a(k, l)</annotation></semantics></math></p>
<h4 id="rrs1">RRS1</h4>
<p>The concrete hash called <code>rrs1</code> uses the values:</p>
<ul>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>M</mi><mo>=</mo><msup><mn>2</mn><mn>16</mn></msup></mrow><annotation encoding="application/x-tex">M = 2^{16}</annotation></semantics></math></li>
<li><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>c</mi><mo>=</mo><mn>31</mn></mrow><annotation encoding="application/x-tex">c = 31</annotation></semantics></math></li>
</ul>
<p><code>rrs1</code> is used by both Bup and Perkeep, and implemented by the Go package <code>go4.org/rollsum</code>.</p>
<h3 id="implementation-1">Implementation</h3>
<h4 id="rolling-1">Rolling</h4>
<p><code>rrs</code> is a family of <em>rolling</em> hashes. We can compute hashes in a rolling fashion by taking advantage of the fact that, for <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>l</mi><mo>≥</mo><mi>k</mi><mo>≥</mo><mn>0</mn></mrow><annotation encoding="application/x-tex">l \geq k \geq 0</annotation></semantics></math>:</p>
<p><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>a</mi><mo stretchy="false" form="prefix">(</mo><mi>k</mi><mo>+</mo><mn>1</mn><mo>,</mo><mi>l</mi><mo>+</mo><mn>1</mn><mo stretchy="false" form="postfix">)</mo><mo>=</mo><mo stretchy="false" form="prefix">(</mo><mi>a</mi><mo stretchy="false" form="prefix">(</mo><mi>k</mi><mo>,</mo><mi>l</mi><mo stretchy="false" form="postfix">)</mo><mo>−</mo><mo stretchy="false" form="prefix">(</mo><msub><mi>X</mi><mi>k</mi></msub><mo>+</mo><mi>c</mi><mo stretchy="false" form="postfix">)</mo><mo>+</mo><mo stretchy="false" form="prefix">(</mo><msub><mi>X</mi><mrow><mi>l</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>+</mo><mi>c</mi><mo stretchy="false" form="postfix">)</mo><mo stretchy="false" form="postfix">)</mo><mo>mod</mo><mi>M</mi></mrow><annotation encoding="application/x-tex">a(k + 1, l + 1) = (a(k, l) - (X_k + c) + (X_{l+1} + c)) \mod M</annotation></semantics></math></p>
<p><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>b</mi><mo stretchy="false" form="prefix">(</mo><mi>k</mi><mo>+</mo><mn>1</mn><mo>,</mo><mi>l</mi><mo>+</mo><mn>1</mn><mo stretchy="false" form="postfix">)</mo><mo>=</mo><mo stretchy="false" form="prefix">(</mo><mi>b</mi><mo stretchy="false" form="prefix">(</mo><mi>k</mi><mo>,</mo><mi>l</mi><mo stretchy="false" form="postfix">)</mo><mo>−</mo><mo stretchy="false" form="prefix">(</mo><mi>l</mi><mo>−</mo><mi>k</mi><mo>+</mo><mn>1</mn><mo stretchy="false" form="postfix">)</mo><mo stretchy="false" form="prefix">(</mo><msub><mi>X</mi><mi>k</mi></msub><mo>+</mo><mi>c</mi><mo stretchy="false" form="postfix">)</mo><mo>+</mo><mi>a</mi><mo stretchy="false" form="prefix">(</mo><mi>k</mi><mo>+</mo><mn>1</mn><mo>,</mo><mi>l</mi><mo>+</mo><mn>1</mn><mo stretchy="false" form="postfix">)</mo><mo stretchy="false" form="postfix">)</mo><mo>mod</mo><mi>M</mi></mrow><annotation encoding="application/x-tex">b(k + 1, l + 1) = (b(k, l) - (l - k + 1)(X_k + c) + a(k + 1, l + 1)) \mod M</annotation></semantics></math></p>
<p>So, a typical implementation will work like this:</p>
<ul>
<li>Keep <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false" form="prefix">⟨</mo><msub><mi>X</mi><mi>k</mi></msub><mo>,</mo><mi>…</mi><mo>,</mo><msub><mi>X</mi><mi>l</mi></msub><mo stretchy="false" form="postfix">⟩</mo></mrow><annotation encoding="application/x-tex">\langle X_k, \dots, X_l \rangle</annotation></semantics></math> in a ring buffer.</li>
<li>Also store <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>a</mi><mo stretchy="false" form="prefix">(</mo><mi>k</mi><mo>,</mo><mi>l</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">a(k, l)</annotation></semantics></math> and <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>b</mi><mo stretchy="false" form="prefix">(</mo><mi>k</mi><mo>,</mo><mi>l</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">b(k, l)</annotation></semantics></math>.</li>
<li>When <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>X</mi><mrow><mi>l</mi><mo>+</mo><mn>1</mn></mrow></msub><annotation encoding="application/x-tex">X_{l+1}</annotation></semantics></math> is added to the hash:</li>
<li>Dequeue <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>X</mi><mi>k</mi></msub><annotation encoding="application/x-tex">X_k</annotation></semantics></math> from the ring buffer, and enqueue <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>X</mi><mrow><mi>l</mi><mo>+</mo><mn>1</mn></mrow></msub><annotation encoding="application/x-tex">X_{l+1}</annotation></semantics></math>.</li>
<li>Use <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>X</mi><mi>k</mi></msub><annotation encoding="application/x-tex">X_k</annotation></semantics></math>, <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><msub><mi>X</mi><mrow><mi>l</mi><mo>+</mo><mn>1</mn></mrow></msub><annotation encoding="application/x-tex">X_{l+1}</annotation></semantics></math>, and the stored <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>a</mi><mo stretchy="false" form="prefix">(</mo><mi>k</mi><mo>,</mo><mi>l</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">a(k, l)</annotation></semantics></math> and <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>b</mi><mo stretchy="false" form="prefix">(</mo><mi>k</mi><mo>,</mo><mi>l</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">b(k, l)</annotation></semantics></math> to compute <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>a</mi><mo stretchy="false" form="prefix">(</mo><mi>k</mi><mo>+</mo><mn>1</mn><mo>,</mo><mi>l</mi><mo>+</mo><mn>1</mn><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">a(k + 1, l + 1)</annotation></semantics></math> and <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>b</mi><mo stretchy="false" form="prefix">(</mo><mi>k</mi><mo>+</mo><mn>1</mn><mo>,</mo><mi>l</mi><mo>+</mo><mn>1</mn><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">b(k + 1, l + 1)</annotation></semantics></math>. Then use those values to compute <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>s</mi><mo stretchy="false" form="prefix">(</mo><mi>k</mi><mo>+</mo><mn>1</mn><mo>,</mo><mi>l</mi><mo>+</mo><mn>1</mn><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">s(k + 1, l + 1)</annotation></semantics></math> and also store them for future use.</li>
</ul>
<p>In all cases the ring buffer should initially contain all zero bytes, reflecting the use of <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>X</mi><mi>i</mi></msub><mo>=</mo><mn>0</mn></mrow><annotation encoding="application/x-tex">X_i = 0</annotation></semantics></math> for <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>i</mi><mo><</mo><mn>0</mn></mrow><annotation encoding="application/x-tex">i < 0</annotation></semantics></math> in <a href="#splitting">"Splitting"</a>, above.</p>
<h4 id="choice-of-m">Choice of M</h4>
<p>Choosing <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>M</mi><mo>=</mo><msup><mn>2</mn><mn>16</mn></msup></mrow><annotation encoding="application/x-tex">M = 2^{16}</annotation></semantics></math> has the advantages of simplicity and efficiency, as it allows <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>s</mi><mo stretchy="false" form="prefix">(</mo><mi>k</mi><mo>,</mo><mi>l</mi><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">s(k, l)</annotation></semantics></math> to be computed using only shifts and bitwise operators:</p>
<p><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>s</mi><mo stretchy="false" form="prefix">(</mo><mi>k</mi><mo>,</mo><mi>l</mi><mo stretchy="false" form="postfix">)</mo><mo>=</mo><mi>b</mi><mo stretchy="false" form="prefix">(</mo><mi>k</mi><mo>,</mo><mi>l</mi><mo stretchy="false" form="postfix">)</mo><mo>∨</mo><mo stretchy="false" form="prefix">(</mo><mi>a</mi><mo stretchy="false" form="prefix">(</mo><mi>k</mi><mo>,</mo><mi>l</mi><mo stretchy="false" form="postfix">)</mo><mo>≪</mo><mn>16</mn><mo stretchy="false" form="postfix">)</mo></mrow><annotation encoding="application/x-tex">s(k, l) = b(k, l) \vee (a(k, l) \ll 16)</annotation></semantics></math></p>
<h1 id="appendix">Appendix</h1>
<p>The definition of <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mi>G</mi><annotation encoding="application/x-tex">G</annotation></semantics></math> as used by <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mo>CP32</mo><annotation encoding="application/x-tex">\operatorname{CP32}</annotation></semantics></math> is:</p>
<p><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mo stretchy="false" form="prefix">⟨</mo><annotation encoding="application/x-tex">\langle</annotation></semantics></math></p>
<pre><code>0x6b326ac4, 0x13f8e1bd, 0x1d61066f, 0x87733fc7, 0x37145391, 0x1c115e40,
0xd2ea17a3, 0x8650e4b1, 0xe892bb09, 0x408a0c3a, 0x3c40b72c, 0x2a988fb0,
0xf691d0f8, 0xb22072d9, 0x6fa8b705, 0x72bd6386, 0xdd905ac3, 0x7fcba0ba,
0x4f84a51c, 0x1dd8477e, 0x6f972f2c, 0xaccd018e, 0xe2964f13, 0x7a7d2388,
0xebf42ca7, 0xa8e2a0a2, 0x8eb726d3, 0xccd169b6, 0x5444f61e, 0xe178ad7a,
0xd556a18d, 0xbac80ef4, 0x34cb8a87, 0x7740a1a9, 0x62640fe1, 0xb1e64472,
0xdee2d6c8, 0x27849114, 0xb6333f4b, 0xbb0b5c1d, 0x57e53652, 0xfde51999,
0xef773313, 0x1bbaf941, 0x2e9aa084, 0x37587ab8, 0xa61e7c54, 0xb779be61,
0xd8795bfd, 0x1707c1f6, 0x50fe9c54, 0x32ff3685, 0x94f55c22, 0x2a32ce1a,
0x0b9076ab, 0x14363079, 0xae994b2c, 0x4a8da881, 0x4770b9c4, 0xf4d143dd,
0x70a90c0b, 0xa094582a, 0x4b254d10, 0x2454325e, 0x1725a589, 0x9a3380da,
0x948eeade, 0x79f88224, 0x7b8dc378, 0xc2090db6, 0x41f7a7ac, 0xd4d9528c,
0x7f0bace7, 0xd3157814, 0xd7757bc4, 0xb428db06, 0x2e2b1d02, 0x0499bcf5,
0x310f963e, 0xe5f31a83, 0xe0cd600f, 0x8b48af14, 0x568eb23a, 0x01d1150b,
0x33f54023, 0xa0e59fdf, 0x8d17c2dd, 0xfb7bd347, 0x4d8cd432, 0x664db8de,
0xd48f2a6c, 0x16c3412d, 0x873a32fc, 0x10796a21, 0xed40f0f8, 0x5ca8e9b2,
0x0f70d259, 0x0df532c2, 0x016d73aa, 0x45761aa5, 0x189b45a7, 0x4accd733,
0x641f90e3, 0x592ed9ee, 0x4b1d72ad, 0x42ff2cd4, 0x0654b609, 0x799012c0,
0x595f36a4, 0x082bdbd6, 0x0375ddd3, 0xc16c1fb5, 0x57492df8, 0xa2d56a98,
0xdfb2aa28, 0x3728f35f, 0xdc49ea71, 0x9aee8377, 0xd62de2ab, 0x2c3aa155,
0x407d9eed, 0xbc5b3832, 0x42961924, 0x1498172a, 0xc7126716, 0x95494b56,
0xd40442fb, 0xb22a3ed1, 0x0ad3e0ae, 0x77a6136a, 0xfb1bc3f0, 0x1a715c38,
0xccbbd21d, 0x061ff037, 0x85d700cb, 0x8a8fb396, 0x956bbe48, 0xf2556ed8,
0x3319c88b, 0xe0d6d3e9, 0x4783b316, 0x03a73543, 0x253be5ed, 0x41322aea,
0xdfc00c7a, 0x972b9413, 0xccca42f5, 0x0a1cdf35, 0xa2dc31b8, 0xf48397eb,
0xbe3f2b3e, 0xd2950b9f, 0xccd269cf, 0x51a64ca9, 0xea46d96e, 0xcaec892e,
0x3fae3a62, 0xf12e53db, 0x3753464c, 0x214fbd91, 0x609ce2f7, 0x6158b44c,
0xa74b8027, 0x79f36912, 0x16cac162, 0x5e76df4f, 0xbc4184fb, 0x912cac7d,
0xf97e5704, 0x664dd25f, 0x7d837805, 0x5386cfe0, 0x4e585d77, 0xa0fa527e,
0xeb5c8401, 0xa186cc51, 0x05ef3f1f, 0xc1efc774, 0x38730c2c, 0xad9c5539,
0x27cd4938, 0x7317b4f2, 0x852c186f, 0xa4c9b0f4, 0xf592f010, 0xf6fe86f3,
0xb14ba86c, 0x07109a27, 0x0d00568d, 0xd92ee49f, 0xdc643eb3, 0x8d81c333,
0xcd1d7bbd, 0x87ff9cda, 0x80fa4285, 0x25258d5b, 0xd9e4065a, 0x78955c18,
0x84874c2a, 0xfdae136b, 0x48eeb3d3, 0xc2623958, 0x5a74f96d, 0x0bcb49f5,
0x3041cefc, 0xa5b0a1a8, 0x2d29bae6, 0x916ace93, 0x0e70564d, 0xa24894ae,
0x9897044d, 0xcba97c2a, 0x52a313b1, 0x318ec481, 0xc4729ec1, 0xd90ad78a,
0x55eb9f90, 0x4f159fda, 0xa90fbd44, 0xd0ca6208, 0x5c597269, 0xe05a471e,
0x26a5e224, 0x97144944, 0xece2c486, 0xf65c9a9e, 0x82a3fbbb, 0x925d1a62,
0xd6c4c29b, 0x61b9292d, 0x161529c9, 0x37713240, 0x68ec933b, 0xed80a4e5,
0x02b2db41, 0x47cfd676, 0xbfe26b41, 0x5e8468bb, 0x6e0d15a4, 0x40383ef4,
0x81e622fb, 0x194b378c, 0x0c503af5, 0x8e0033a7, 0x003aaa5e, 0x9d7b6723,
0x0702e877, 0x34b75166, 0xd1ba98d8, 0x9b9f1794, 0xe8961c84, 0x9d773b17,
0xf9783ee9, 0xdff11758, 0x49bea2cf, 0xa0e0887f</code></pre>
<p><math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mo stretchy="false" form="postfix">⟩</mo><annotation encoding="application/x-tex">\rangle</annotation></semantics></math></p>