@@ -35,18 +35,22 @@ This section discusses notation used in this specification.
35
35
36
36
We define the following sets:
37
37
38
- - $U_ {32}$, The set of integers in the range $[ 0, 2^{32})$
38
+ - $U_ {32}$, The set of integers in the range $[ 0, 2^{32})$.
39
39
- $U_8$, The set of integers in the range $[ 0, 2^8)$, aka bytes.
40
40
- $V_8$, The set of * sequences* of bytes, i.e. sequences of
41
41
$U_8$.
42
42
- $V_v$, The set of * sequences* of * sequences* of bytes, i.e.
43
43
sequences of elements of $V_8$.
44
+ - $V_ {32}$, The set of sequences of elements of $U_ {32}$.
44
45
45
46
All arithmetic operations in this document are implicitly performed
46
47
modulo $2^{32}$. We use standard mathematical notation for addition,
47
48
subtraction, multiplication, and exponentiation. Division always
48
49
denotes integer division, i.e. any remainder is dropped.
49
50
51
+ Numerals staring with the prefix ` 0x ` are hexadecimal, e.g. ` 0xfe `
52
+ for the (decimal) number 254
53
+
50
54
We use the notation $\langle X_0, X_1, \dots, X_k \rangle$ to denote
51
55
an ordered sequence of values.
52
56
@@ -56,18 +60,38 @@ elements it contains.
56
60
We also use the following operators and functions:
57
61
58
62
- $x \wedge y$ denotes the bitwise AND of $x$ and $y$
59
- - $x \vee y$ denotes the bitwise OR of $x$ and $y$
63
+ - $x \vee y$ denotes the bitwise * inclusive* OR of $x$ and $y$
64
+ - $x \oplus y$ denotes the bitwise * exclusive* OR of $x$ and $y$
60
65
- $x \ll n$ denotes shifting $x$ to the left $n$ bits, i.e.
61
66
$x \ll n = x2^{n}$
62
67
- $x \gg n$ denotes a * logical* right shift -- it shifts $x$ to the
63
68
right by $n$ bits, i.e. $x \gg n = x / 2^n$
64
- - $X \mathbin{\| } Y$ denotes the concatenation of two sequences $X$ and $Y$,
69
+ - $X \mathbin{\| } Y$ denotes the concatenation of two sequences $X$ and
70
+ $Y$,
65
71
i.e. if $X = \langle X_0, \dots, X_N \rangle$ and $Y = \langle Y_0,
66
72
\dots, Y_M \rangle$ then $X \mathbin{\| } Y = \langle X_0, \dots, X_N, Y_0, \dots, Y_M
67
73
\rangle$
68
- - $\min(x, y)$ denotes the minimum of $x$ and $y$ and $\max(x, y)$ denotes the maximum
74
+ - $\min(x, y)$ denotes the minimum of $x$ and $y$ and $\max(x, y)$
75
+ denotes the maximum
76
+ - $\operatorname{ROT}_ L(x, n)$ denotes the rotation of $x$ to the left
77
+ by $n$ bits, i.e. $\operatorname{ROT}_ L(x, n) = (x \ll n) \vee (x \gg
78
+ (32 - n))$
69
79
- $\operatorname{Type}(x)$ denotes the type of $x$.
70
80
81
+ We use standard mathematical notation for summation. For example:
82
+
83
+ $\sum_ {i = 0}^{n} i$
84
+
85
+ denotes the sum of integers in the range $[ 0, n] $.
86
+
87
+ We define a similar notation for exclusive or:
88
+
89
+ $\bigoplus_ {i = 0}^{n} i$
90
+
91
+ denotes the bitwise exclusive or of the integers in $[ 0, n] $, i.e.
92
+
93
+ $\bigoplus_ {i = 0}^{n} i = 0 \oplus 1 \oplus \dots \oplus n$
94
+
71
95
Finally, we define the “prefix” $\mathbb{P}_ q(X)$
72
96
of a non-empty sequence $X$
73
97
with respect to a given predicate $q$
@@ -276,6 +300,53 @@ To “close” a node $N_i$:
276
300
277
301
# Rolling Hash Functions
278
302
303
+ ## CP32
304
+
305
+ The ` cp32 ` hash function is based on cyclic polynomials. The family of
306
+ related functions is sometimes also called "buzhash." ` cp32 ` is the
307
+ recommended hash function for use with hashsplit; use it unless you have
308
+ clear reasons for doing otherwise.
309
+
310
+ ### Definition
311
+
312
+ We define the function $\operatorname{CP32} \in V_8 \rightarrow U_ {32}$
313
+ as:
314
+
315
+ $\operatorname{CP32}(X) = \bigoplus_ {i = 0}^{|X| - 1}
316
+ \operatorname{ROT}_ L(g(X_i), |X| - i + 1)$
317
+
318
+ Where $g(n) = G_n$ and the sequence $G \in V_ {32}$ is defined in the
319
+ appendix.
320
+
321
+ The sequence $G$ was chosen at random. Note that $|G| = 256$, so
322
+ $g(n)$ is always defined.
323
+
324
+ ### Implementation
325
+
326
+ ## Rolling
327
+
328
+ $\operatorname{CP32}$ can be computed in a rolling fashion; for
329
+ sequences
330
+
331
+ $X = \langle X_0, \dots, X_N \rangle$
332
+
333
+ and
334
+
335
+ $Y = \langle X_1, \dots, X_N, y \rangle$
336
+
337
+ Given $\operatorname{CP32}(X)$, $X_0$ and $y$, we can compute
338
+ $\operatorname{CP32}(Y)$ as:
339
+
340
+ $\operatorname{CP32}(Y) = \operatorname{ROT}_ L(\operatorname{CP32}(X),
341
+ 1 ) \oplus \operatorname{ROT}_ L(g(X_0), |X| \mod 32) \oplus g(y)$.
342
+
343
+ Note that the splitting algorithm only computes hashes on sequences of
344
+ size $W = 64$, and since 64 is a multiple of 32 this means that for the
345
+ purposes of splitting, the above can be simplified to:
346
+
347
+ $\operatorname{CP32}(Y) = \operatorname{ROT}_ L(\operatorname{CP32}(X),
348
+ 1 ) \oplus g(X_0) \oplus g(y)$.
349
+
279
350
## The RRS Rolling Checksums
280
351
281
352
The ` rrs ` family of checksums is based on an algorithm first used
@@ -343,6 +414,58 @@ operators:
343
414
344
415
$s(k, l) = b(k, l) \vee (a(k, l) \ll 16)$
345
416
417
+ # Appendix
418
+
419
+ The definition of $G$ as used by $\operatorname{CP32}$ is:
420
+
421
+ $\langle$
422
+ ```
423
+ 0x6b326ac4, 0x13f8e1bd, 0x1d61066f, 0x87733fc7, 0x37145391, 0x1c115e40,
424
+ 0xd2ea17a3, 0x8650e4b1, 0xe892bb09, 0x408a0c3a, 0x3c40b72c, 0x2a988fb0,
425
+ 0xf691d0f8, 0xb22072d9, 0x6fa8b705, 0x72bd6386, 0xdd905ac3, 0x7fcba0ba,
426
+ 0x4f84a51c, 0x1dd8477e, 0x6f972f2c, 0xaccd018e, 0xe2964f13, 0x7a7d2388,
427
+ 0xebf42ca7, 0xa8e2a0a2, 0x8eb726d3, 0xccd169b6, 0x5444f61e, 0xe178ad7a,
428
+ 0xd556a18d, 0xbac80ef4, 0x34cb8a87, 0x7740a1a9, 0x62640fe1, 0xb1e64472,
429
+ 0xdee2d6c8, 0x27849114, 0xb6333f4b, 0xbb0b5c1d, 0x57e53652, 0xfde51999,
430
+ 0xef773313, 0x1bbaf941, 0x2e9aa084, 0x37587ab8, 0xa61e7c54, 0xb779be61,
431
+ 0xd8795bfd, 0x1707c1f6, 0x50fe9c54, 0x32ff3685, 0x94f55c22, 0x2a32ce1a,
432
+ 0x0b9076ab, 0x14363079, 0xae994b2c, 0x4a8da881, 0x4770b9c4, 0xf4d143dd,
433
+ 0x70a90c0b, 0xa094582a, 0x4b254d10, 0x2454325e, 0x1725a589, 0x9a3380da,
434
+ 0x948eeade, 0x79f88224, 0x7b8dc378, 0xc2090db6, 0x41f7a7ac, 0xd4d9528c,
435
+ 0x7f0bace7, 0xd3157814, 0xd7757bc4, 0xb428db06, 0x2e2b1d02, 0x0499bcf5,
436
+ 0x310f963e, 0xe5f31a83, 0xe0cd600f, 0x8b48af14, 0x568eb23a, 0x01d1150b,
437
+ 0x33f54023, 0xa0e59fdf, 0x8d17c2dd, 0xfb7bd347, 0x4d8cd432, 0x664db8de,
438
+ 0xd48f2a6c, 0x16c3412d, 0x873a32fc, 0x10796a21, 0xed40f0f8, 0x5ca8e9b2,
439
+ 0x0f70d259, 0x0df532c2, 0x016d73aa, 0x45761aa5, 0x189b45a7, 0x4accd733,
440
+ 0x641f90e3, 0x592ed9ee, 0x4b1d72ad, 0x42ff2cd4, 0x0654b609, 0x799012c0,
441
+ 0x595f36a4, 0x082bdbd6, 0x0375ddd3, 0xc16c1fb5, 0x57492df8, 0xa2d56a98,
442
+ 0xdfb2aa28, 0x3728f35f, 0xdc49ea71, 0x9aee8377, 0xd62de2ab, 0x2c3aa155,
443
+ 0x407d9eed, 0xbc5b3832, 0x42961924, 0x1498172a, 0xc7126716, 0x95494b56,
444
+ 0xd40442fb, 0xb22a3ed1, 0x0ad3e0ae, 0x77a6136a, 0xfb1bc3f0, 0x1a715c38,
445
+ 0xccbbd21d, 0x061ff037, 0x85d700cb, 0x8a8fb396, 0x956bbe48, 0xf2556ed8,
446
+ 0x3319c88b, 0xe0d6d3e9, 0x4783b316, 0x03a73543, 0x253be5ed, 0x41322aea,
447
+ 0xdfc00c7a, 0x972b9413, 0xccca42f5, 0x0a1cdf35, 0xa2dc31b8, 0xf48397eb,
448
+ 0xbe3f2b3e, 0xd2950b9f, 0xccd269cf, 0x51a64ca9, 0xea46d96e, 0xcaec892e,
449
+ 0x3fae3a62, 0xf12e53db, 0x3753464c, 0x214fbd91, 0x609ce2f7, 0x6158b44c,
450
+ 0xa74b8027, 0x79f36912, 0x16cac162, 0x5e76df4f, 0xbc4184fb, 0x912cac7d,
451
+ 0xf97e5704, 0x664dd25f, 0x7d837805, 0x5386cfe0, 0x4e585d77, 0xa0fa527e,
452
+ 0xeb5c8401, 0xa186cc51, 0x05ef3f1f, 0xc1efc774, 0x38730c2c, 0xad9c5539,
453
+ 0x27cd4938, 0x7317b4f2, 0x852c186f, 0xa4c9b0f4, 0xf592f010, 0xf6fe86f3,
454
+ 0xb14ba86c, 0x07109a27, 0x0d00568d, 0xd92ee49f, 0xdc643eb3, 0x8d81c333,
455
+ 0xcd1d7bbd, 0x87ff9cda, 0x80fa4285, 0x25258d5b, 0xd9e4065a, 0x78955c18,
456
+ 0x84874c2a, 0xfdae136b, 0x48eeb3d3, 0xc2623958, 0x5a74f96d, 0x0bcb49f5,
457
+ 0x3041cefc, 0xa5b0a1a8, 0x2d29bae6, 0x916ace93, 0x0e70564d, 0xa24894ae,
458
+ 0x9897044d, 0xcba97c2a, 0x52a313b1, 0x318ec481, 0xc4729ec1, 0xd90ad78a,
459
+ 0x55eb9f90, 0x4f159fda, 0xa90fbd44, 0xd0ca6208, 0x5c597269, 0xe05a471e,
460
+ 0x26a5e224, 0x97144944, 0xece2c486, 0xf65c9a9e, 0x82a3fbbb, 0x925d1a62,
461
+ 0xd6c4c29b, 0x61b9292d, 0x161529c9, 0x37713240, 0x68ec933b, 0xed80a4e5,
462
+ 0x02b2db41, 0x47cfd676, 0xbfe26b41, 0x5e8468bb, 0x6e0d15a4, 0x40383ef4,
463
+ 0x81e622fb, 0x194b378c, 0x0c503af5, 0x8e0033a7, 0x003aaa5e, 0x9d7b6723,
464
+ 0x0702e877, 0x34b75166, 0xd1ba98d8, 0x9b9f1794, 0xe8961c84, 0x9d773b17,
465
+ 0xf9783ee9, 0xdff11758, 0x49bea2cf, 0xa0e0887f
466
+ ```
467
+ $\rangle$
468
+
346
469
[ rsync ] : https://rsync.samba.org/tech_report/node3.html
347
470
[ bup ] : https://bup.github.io/
348
471
[ perkeep ] : https://perkeep.org/
0 commit comments