-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathSPEC.txt
56 lines (39 loc) · 2.27 KB
/
SPEC.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
-----
NILDB file format (version 1)
Author: Tiebing Zhang <[email protected]>,Adam Ierymenko <[email protected]>
http://creativecommons.org/publicdomain/zero/1.0/
-----
In keeping with the goal of minimalism the file format is very simple, the
sort of thing that would be given as an example in an introductory course in
data structures. It's a basic hash table that adds additional pages of hash
table entries on collision.
It consists of a 16 byte header followed by a hash tables and then data.
All integer values are stored in the big-endian word order (network order).
The header consists of the following fields:
[0-3] magic numbers: (ASCII) 'N', 'D', 'B', NILDB_VERSION (currently 1)
[4-7] 32-bit hash table size in entries
[8-11] 32-bit key size in bytes
[12-15] 32-bit value size in bytes
Hash tables is an arrays of [hash table size] 32-bit unsigned integers, as
data offset from the beginning of the data portion.
Hash table size is 4 * [hash table size].
Immediately following the header is the first key/value and other fields.
Structure of each key/value entry
[0] 1-byte flags. 0x01 means valid, 0x00 means deleted.
[1-5] 32-bit offset, from this entry to the next link-listed entry with the same hash
fixed-size key data
fixed-size value data
The algorithm for adding new entries is as follows:
(1) The key is hashed using a 64-bit variant of the DJB2 hash function, and
this is taken modulo hash table size to get a bucket number.
(2) The Hash table is checked to locate the first entry. If the hash table offset
is 0xFFFFFFFF, which indicates no entry, an entry is appened to the end of
the file and the hash table index is updated with the correct offset. If the
hash table offset is valid, then the link list is traversed until the
end. During the traverse, if the same key is found, the value is updated
and over. Otherwise, if an deleted entry is found, that entry is used for
the new entry, and flags set to valid. If no deleted entry found, an entry
is appened to the file, and the last link list entry offset is updated.
Lookup of a key/value pair follows a simiar method of traversing the link list.
Deleting an entry simply updates the entry's active bit to "deleted", so that
the space can be reused later. The file will never shrink.