Skip to content

Commit 73a56da

Browse files
authored
Merge pull request #24 from hashsplit/cp32
Specify buzhash
2 parents a9d62bd + 9e0af82 commit 73a56da

File tree

1 file changed

+127
-4
lines changed

1 file changed

+127
-4
lines changed

spec.md

Lines changed: 127 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -35,18 +35,22 @@ This section discusses notation used in this specification.
3535

3636
We define the following sets:
3737

38-
- $U_{32}$, The set of integers in the range $[0, 2^{32})$
38+
- $U_{32}$, The set of integers in the range $[0, 2^{32})$.
3939
- $U_8$, The set of integers in the range $[0, 2^8)$, aka bytes.
4040
- $V_8$, The set of *sequences* of bytes, i.e. sequences of
4141
$U_8$.
4242
- $V_v$, The set of *sequences* of *sequences* of bytes, i.e.
4343
sequences of elements of $V_8$.
44+
- $V_{32}$, The set of sequences of elements of $U_{32}$.
4445

4546
All arithmetic operations in this document are implicitly performed
4647
modulo $2^{32}$. We use standard mathematical notation for addition,
4748
subtraction, multiplication, and exponentiation. Division always
4849
denotes integer division, i.e. any remainder is dropped.
4950

51+
Numerals staring with the prefix `0x` are hexadecimal, e.g. `0xfe`
52+
for the (decimal) number 254
53+
5054
We use the notation $\langle X_0, X_1, \dots, X_k \rangle$ to denote
5155
an ordered sequence of values.
5256

@@ -56,18 +60,38 @@ elements it contains.
5660
We also use the following operators and functions:
5761

5862
- $x \wedge y$ denotes the bitwise AND of $x$ and $y$
59-
- $x \vee y$ denotes the bitwise OR of $x$ and $y$
63+
- $x \vee y$ denotes the bitwise *inclusive* OR of $x$ and $y$
64+
- $x \oplus y$ denotes the bitwise *exclusive* OR of $x$ and $y$
6065
- $x \ll n$ denotes shifting $x$ to the left $n$ bits, i.e.
6166
$x \ll n = x2^{n}$
6267
- $x \gg n$ denotes a *logical* right shift -- it shifts $x$ to the
6368
right by $n$ bits, i.e. $x \gg n = x / 2^n$
64-
- $X \mathbin{\|} Y$ denotes the concatenation of two sequences $X$ and $Y$,
69+
- $X \mathbin{\|} Y$ denotes the concatenation of two sequences $X$ and
70+
$Y$,
6571
i.e. if $X = \langle X_0, \dots, X_N \rangle$ and $Y = \langle Y_0,
6672
\dots, Y_M \rangle$ then $X \mathbin{\|} Y = \langle X_0, \dots, X_N, Y_0, \dots, Y_M
6773
\rangle$
68-
- $\min(x, y)$ denotes the minimum of $x$ and $y$ and $\max(x, y)$ denotes the maximum
74+
- $\min(x, y)$ denotes the minimum of $x$ and $y$ and $\max(x, y)$
75+
denotes the maximum
76+
- $\operatorname{ROT}_L(x, n)$ denotes the rotation of $x$ to the left
77+
by $n$ bits, i.e. $\operatorname{ROT}_L(x, n) = (x \ll n) \vee (x \gg
78+
(32 - n))$
6979
- $\operatorname{Type}(x)$ denotes the type of $x$.
7080

81+
We use standard mathematical notation for summation. For example:
82+
83+
$\sum_{i = 0}^{n} i$
84+
85+
denotes the sum of integers in the range $[0, n]$.
86+
87+
We define a similar notation for exclusive or:
88+
89+
$\bigoplus_{i = 0}^{n} i$
90+
91+
denotes the bitwise exclusive or of the integers in $[0, n]$, i.e.
92+
93+
$\bigoplus_{i = 0}^{n} i = 0 \oplus 1 \oplus \dots \oplus n$
94+
7195
Finally, we define the “prefix” $\mathbb{P}_q(X)$
7296
of a non-empty sequence $X$
7397
with respect to a given predicate $q$
@@ -276,6 +300,53 @@ To “close” a node $N_i$:
276300

277301
# Rolling Hash Functions
278302

303+
## CP32
304+
305+
The `cp32` hash function is based on cyclic polynomials. The family of
306+
related functions is sometimes also called "buzhash." `cp32` is the
307+
recommended hash function for use with hashsplit; use it unless you have
308+
clear reasons for doing otherwise.
309+
310+
### Definition
311+
312+
We define the function $\operatorname{CP32} \in V_8 \rightarrow U_{32}$
313+
as:
314+
315+
$\operatorname{CP32}(X) = \bigoplus_{i = 0}^{|X| - 1}
316+
\operatorname{ROT}_L(g(X_i), |X| - i + 1)$
317+
318+
Where $g(n) = G_n$ and the sequence $G \in V_{32}$ is defined in the
319+
appendix.
320+
321+
The sequence $G$ was chosen at random. Note that $|G| = 256$, so
322+
$g(n)$ is always defined.
323+
324+
### Implementation
325+
326+
## Rolling
327+
328+
$\operatorname{CP32}$ can be computed in a rolling fashion; for
329+
sequences
330+
331+
$X = \langle X_0, \dots, X_N \rangle$
332+
333+
and
334+
335+
$Y = \langle X_1, \dots, X_N, y \rangle$
336+
337+
Given $\operatorname{CP32}(X)$, $X_0$ and $y$, we can compute
338+
$\operatorname{CP32}(Y)$ as:
339+
340+
$\operatorname{CP32}(Y) = \operatorname{ROT}_L(\operatorname{CP32}(X),
341+
1) \oplus \operatorname{ROT}_L(g(X_0), |X| \mod 32) \oplus g(y)$.
342+
343+
Note that the splitting algorithm only computes hashes on sequences of
344+
size $W = 64$, and since 64 is a multiple of 32 this means that for the
345+
purposes of splitting, the above can be simplified to:
346+
347+
$\operatorname{CP32}(Y) = \operatorname{ROT}_L(\operatorname{CP32}(X),
348+
1) \oplus g(X_0) \oplus g(y)$.
349+
279350
## The RRS Rolling Checksums
280351

281352
The `rrs` family of checksums is based on an algorithm first used
@@ -343,6 +414,58 @@ operators:
343414

344415
$s(k, l) = b(k, l) \vee (a(k, l) \ll 16)$
345416

417+
# Appendix
418+
419+
The definition of $G$ as used by $\operatorname{CP32}$ is:
420+
421+
$\langle$
422+
```
423+
0x6b326ac4, 0x13f8e1bd, 0x1d61066f, 0x87733fc7, 0x37145391, 0x1c115e40,
424+
0xd2ea17a3, 0x8650e4b1, 0xe892bb09, 0x408a0c3a, 0x3c40b72c, 0x2a988fb0,
425+
0xf691d0f8, 0xb22072d9, 0x6fa8b705, 0x72bd6386, 0xdd905ac3, 0x7fcba0ba,
426+
0x4f84a51c, 0x1dd8477e, 0x6f972f2c, 0xaccd018e, 0xe2964f13, 0x7a7d2388,
427+
0xebf42ca7, 0xa8e2a0a2, 0x8eb726d3, 0xccd169b6, 0x5444f61e, 0xe178ad7a,
428+
0xd556a18d, 0xbac80ef4, 0x34cb8a87, 0x7740a1a9, 0x62640fe1, 0xb1e64472,
429+
0xdee2d6c8, 0x27849114, 0xb6333f4b, 0xbb0b5c1d, 0x57e53652, 0xfde51999,
430+
0xef773313, 0x1bbaf941, 0x2e9aa084, 0x37587ab8, 0xa61e7c54, 0xb779be61,
431+
0xd8795bfd, 0x1707c1f6, 0x50fe9c54, 0x32ff3685, 0x94f55c22, 0x2a32ce1a,
432+
0x0b9076ab, 0x14363079, 0xae994b2c, 0x4a8da881, 0x4770b9c4, 0xf4d143dd,
433+
0x70a90c0b, 0xa094582a, 0x4b254d10, 0x2454325e, 0x1725a589, 0x9a3380da,
434+
0x948eeade, 0x79f88224, 0x7b8dc378, 0xc2090db6, 0x41f7a7ac, 0xd4d9528c,
435+
0x7f0bace7, 0xd3157814, 0xd7757bc4, 0xb428db06, 0x2e2b1d02, 0x0499bcf5,
436+
0x310f963e, 0xe5f31a83, 0xe0cd600f, 0x8b48af14, 0x568eb23a, 0x01d1150b,
437+
0x33f54023, 0xa0e59fdf, 0x8d17c2dd, 0xfb7bd347, 0x4d8cd432, 0x664db8de,
438+
0xd48f2a6c, 0x16c3412d, 0x873a32fc, 0x10796a21, 0xed40f0f8, 0x5ca8e9b2,
439+
0x0f70d259, 0x0df532c2, 0x016d73aa, 0x45761aa5, 0x189b45a7, 0x4accd733,
440+
0x641f90e3, 0x592ed9ee, 0x4b1d72ad, 0x42ff2cd4, 0x0654b609, 0x799012c0,
441+
0x595f36a4, 0x082bdbd6, 0x0375ddd3, 0xc16c1fb5, 0x57492df8, 0xa2d56a98,
442+
0xdfb2aa28, 0x3728f35f, 0xdc49ea71, 0x9aee8377, 0xd62de2ab, 0x2c3aa155,
443+
0x407d9eed, 0xbc5b3832, 0x42961924, 0x1498172a, 0xc7126716, 0x95494b56,
444+
0xd40442fb, 0xb22a3ed1, 0x0ad3e0ae, 0x77a6136a, 0xfb1bc3f0, 0x1a715c38,
445+
0xccbbd21d, 0x061ff037, 0x85d700cb, 0x8a8fb396, 0x956bbe48, 0xf2556ed8,
446+
0x3319c88b, 0xe0d6d3e9, 0x4783b316, 0x03a73543, 0x253be5ed, 0x41322aea,
447+
0xdfc00c7a, 0x972b9413, 0xccca42f5, 0x0a1cdf35, 0xa2dc31b8, 0xf48397eb,
448+
0xbe3f2b3e, 0xd2950b9f, 0xccd269cf, 0x51a64ca9, 0xea46d96e, 0xcaec892e,
449+
0x3fae3a62, 0xf12e53db, 0x3753464c, 0x214fbd91, 0x609ce2f7, 0x6158b44c,
450+
0xa74b8027, 0x79f36912, 0x16cac162, 0x5e76df4f, 0xbc4184fb, 0x912cac7d,
451+
0xf97e5704, 0x664dd25f, 0x7d837805, 0x5386cfe0, 0x4e585d77, 0xa0fa527e,
452+
0xeb5c8401, 0xa186cc51, 0x05ef3f1f, 0xc1efc774, 0x38730c2c, 0xad9c5539,
453+
0x27cd4938, 0x7317b4f2, 0x852c186f, 0xa4c9b0f4, 0xf592f010, 0xf6fe86f3,
454+
0xb14ba86c, 0x07109a27, 0x0d00568d, 0xd92ee49f, 0xdc643eb3, 0x8d81c333,
455+
0xcd1d7bbd, 0x87ff9cda, 0x80fa4285, 0x25258d5b, 0xd9e4065a, 0x78955c18,
456+
0x84874c2a, 0xfdae136b, 0x48eeb3d3, 0xc2623958, 0x5a74f96d, 0x0bcb49f5,
457+
0x3041cefc, 0xa5b0a1a8, 0x2d29bae6, 0x916ace93, 0x0e70564d, 0xa24894ae,
458+
0x9897044d, 0xcba97c2a, 0x52a313b1, 0x318ec481, 0xc4729ec1, 0xd90ad78a,
459+
0x55eb9f90, 0x4f159fda, 0xa90fbd44, 0xd0ca6208, 0x5c597269, 0xe05a471e,
460+
0x26a5e224, 0x97144944, 0xece2c486, 0xf65c9a9e, 0x82a3fbbb, 0x925d1a62,
461+
0xd6c4c29b, 0x61b9292d, 0x161529c9, 0x37713240, 0x68ec933b, 0xed80a4e5,
462+
0x02b2db41, 0x47cfd676, 0xbfe26b41, 0x5e8468bb, 0x6e0d15a4, 0x40383ef4,
463+
0x81e622fb, 0x194b378c, 0x0c503af5, 0x8e0033a7, 0x003aaa5e, 0x9d7b6723,
464+
0x0702e877, 0x34b75166, 0xd1ba98d8, 0x9b9f1794, 0xe8961c84, 0x9d773b17,
465+
0xf9783ee9, 0xdff11758, 0x49bea2cf, 0xa0e0887f
466+
```
467+
$\rangle$
468+
346469
[rsync]: https://rsync.samba.org/tech_report/node3.html
347470
[bup]: https://bup.github.io/
348471
[perkeep]: https://perkeep.org/

0 commit comments

Comments
 (0)