You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/spec.md
+173-24
Original file line number
Diff line number
Diff line change
@@ -1,22 +1,72 @@
1
-
# Serialization specification
1
+
# Serialization Specification
2
2
3
-
*NOTE*: Serialization is done by `bincode_derive` by default. If you enable the `serde` flag, serialization with `serde-derive` is supported as well. `serde-derive` has the same guarantees as `bincode_derive` for now.
3
+
_NOTE_: This specification is primarily defined in the context of Rust, but aims to be implementable across different programming languages.
4
4
5
-
Related issue: <https://github.com/serde-rs/serde/issues/1756#issuecomment-689682123>
5
+
## Definitions
6
6
7
-
## Endian
7
+
-**Variant**: A specific constructor or case of an enum type.
8
+
-**Variant Payload**: The associated data of a specific enum variant.
9
+
-**Discriminant**: A unique identifier for an enum variant, typically represented as an integer.
10
+
-**Basic Types**: Primitive types that have a direct, well-defined binary representation.
8
11
9
-
By default `bincode` will serialize values in little endian encoding. This can be overwritten in the `Config`.
12
+
## Endianness
10
13
11
-
## Basic types
14
+
By default, this serialization format uses little-endian byte order for basic numeric types. This means multi-byte values are encoded with their least significant byte first.
12
15
13
-
Boolean types are encoded with 1 byte for each boolean type, with `0` being `false`, `1` being true. Whilst deserializing every other value will throw an error.
16
+
Endianness can be configured with the following methods, allowing for big-endian serialization when required:
14
17
15
-
All basic numeric types will be encoded based on the configured [IntEncoding](#intencoding).
All floating point types will take up exactly 4 (for `f32`) or 8 (for `f64`) bytes.
21
+
### Byte Order Considerations
22
+
23
+
- Multi-byte values (integers, floats) are affected by endianness
24
+
- Single-byte values (u8, i8) are not affected
25
+
- Struct and collection serialization order is not changed by endianness
26
+
27
+
## Basic Types
28
+
29
+
### Boolean Encoding
30
+
31
+
- Encoded as a single byte
32
+
-`false` is represented by `0`
33
+
-`true` is represented by `1`
34
+
- During deserialization, values other than 0 and 1 will result in an error [`DecodeError::InvalidBooleanValue`](https://docs.rs/bincode/2.0.0-rc/bincode/error/enum.DecodeError.html#variant.InvalidBooleanValue)
35
+
36
+
### Numeric Types
37
+
38
+
- Encoded based on the configured [IntEncoding](#intencoding)
39
+
- Signed integers use 2's complement representation
40
+
- Floating point types use IEEE 754-2008 standard
41
+
-`f32`: 4 bytes (binary32)
42
+
-`f64`: 8 bytes (binary64)
43
+
44
+
#### Floating Point Special Values
45
+
46
+
- Subnormal numbers are preserved
47
+
- Also known as denormalized numbers
48
+
- Maintain their exact bit representation
49
+
-`NaN` values are preserved
50
+
- Both quiet and signaling `NaN` are kept as-is
51
+
- Bit pattern of `NaN` is maintained exactly
52
+
- No normalization or transformation of special values occurs
53
+
- Serialization and deserialization do not alter the bit-level representation
54
+
- Consistent with IEEE 754-2008 standard for floating-point arithmetic
55
+
56
+
### Character Encoding
57
+
58
+
-`char` is encoded as a 32-bit unsigned integer representing its Unicode Scalar Value
59
+
- Valid Unicode Scalar Value range:
60
+
- 0x0000 to 0xD7FF (Basic Multilingual Plane)
61
+
- 0xE000 to 0x10FFFF (Supplementary Planes)
62
+
- Surrogate code points (0xD800 to 0xDFFF) are not valid
63
+
- Invalid Unicode characters can be acquired via unsafe code, this is handled as:
64
+
- during serialization: data is written as-is
65
+
- during deserialization: an error is raised [`DecodeError::InvalidCharEncoding`](https://docs.rs/bincode/2.0.0-rc/bincode/error/enum.DecodeError.html#variant.InvalidCharEncoding)
66
+
- No additional metadata or encoding scheme beyond the raw code point value
18
67
19
68
All tuples have no additional bytes, and are encoded in their specified order, e.g.
Bincode currently supports 2 different types of `IntEncoding`. With the default config, `VarintEncoding` is selected.
31
82
32
83
### VarintEncoding
84
+
33
85
Encoding an unsigned integer v (of any type excepting u8/i8) works as follows:
34
86
35
87
1. If `u < 251`, encode it as a single byte with that value.
@@ -54,7 +106,7 @@ See the documentation of [FixintEncoding](https://docs.rs/bincode/2.0.0-rc/binco
54
106
55
107
Enums are encoded with their variant first, followed by optionally the variant fields. The variant index is based on the `IntEncoding` during serialization.
56
108
57
-
Both named and unnamed fields are serialized with their values only, and therefor encode to the same value.
109
+
Both named and unnamed fields are serialized with their values only, and therefore encode to the same value.
Collections are encoded with their length value first, following by each entry of the collection. The length value is based on your `IntEncoding`.
161
+
## General Collection Serialization
109
162
110
-
**note**: fixed array length may not have their `len` encoded. See [Arrays](#arrays)
163
+
Collections are encoded with their length value first, followed by each entry of the collection. The length value is based on the configured `IntEncoding`.
111
164
112
-
```rust
113
-
letlist=vec![
114
-
0u8,
115
-
1u8,
116
-
2u8
117
-
];
165
+
### Serialization Considerations
166
+
167
+
- Length is always serialized first
168
+
- Entries are serialized in the order they are returned from the iterator implementation.
169
+
- Iteration order depends on the collection type
170
+
- Ordered collections (e.g., `Vec`): Iteration from lowest to highest index
171
+
- Unordered collections (e.g., `HashMap`): Implementation-defined iteration order
172
+
- Duplicate keys are not checked in bincode, but may be resulting in an error when decoding a container from a list of pairs.
173
+
174
+
### Handling of Specific Collection Types
118
175
176
+
#### Linear Collections (`Vec`, Arrays, etc.)
177
+
178
+
- Serialized by iterating from lowest to highest index
This also applies to e.g. `HashMap`, where each entry is a [tuple](#basic-types) of the key and value.
193
+
#### Key-Value Collections (`HashMap`, etc.)
194
+
195
+
- Serialized as a sequence of key-value pairs
196
+
- Iteration order is implementation-defined
197
+
- Each entry is a tuple of (key, value)
198
+
199
+
### Special Collection Considerations
200
+
201
+
- Bincode will serialize the entries based on the iterator order.
202
+
- Deserialization is deterministic but the collection implementation might not guarantee the same order as serialization.
203
+
204
+
**Note**: Fixed-length arrays do not have their length encoded. See [Arrays](#arrays) for details.
129
205
130
206
# String and &str
131
207
132
-
Both `String` and `&str` are treated as a `Vec<u8>`. See [Collections](#collections) for more information.
208
+
## Encoding Principles
209
+
210
+
- Strings are encoded as UTF-8 byte sequences
211
+
- No null terminator is added
212
+
- No Byte Order Mark (BOM) is written
213
+
- Unicode non-characters are preserved
214
+
215
+
### Encoding Details
216
+
217
+
- Length is encoded first using the configured `IntEncoding`
218
+
- Raw UTF-8 bytes follow the length
219
+
- Supports the full range of valid UTF-8 sequences
220
+
-`U+0000` and other code points can appear freely within the string
221
+
222
+
### Unicode Handling
223
+
224
+
- During serialization, the string is encoded as a sequence of the given bytes.
225
+
- Rust strings are UTF-8 encoded by default, but this is not enforced by bincode
226
+
- No normalization or transformation of text
227
+
- If an invalid UTF-8 sequence is encountered during decoding, an [`DecodeError::Utf8`](https://docs.rs/bincode/2.0.0-rc/bincode/error/enum.DecodeError.html#variant.Utf8) error is raised
133
228
134
229
```rust
135
-
letstr="Hello"; //Could also be `String::new(...)`
Tuple fields are serialized in first-to-last declaration order, with no additional metadata.
289
+
290
+
- No length prefix is added
291
+
- Fields are encoded sequentially
292
+
- No padding or alignment adjustments are made
293
+
- Order of serialization is deterministic and matches the tuple's declaration order
294
+
295
+
## StructEncoding
296
+
297
+
Struct fields are serialized in first-to-last declaration order, with no metadata representing field names.
298
+
299
+
- No length prefix is added
300
+
- Fields are encoded sequentially
301
+
- No padding or alignment adjustments are made
302
+
- Order of serialization is deterministic and matches the struct's field declaration order
303
+
- Both named and unnamed fields are serialized identically
304
+
305
+
## EnumEncoding
306
+
307
+
Enum variants are encoded with a discriminant followed by optional variant payload.
308
+
309
+
### Discriminant Allocation
310
+
311
+
- Discriminants are automatically assigned by the derive macro in declaration order
312
+
- First variant starts at 0
313
+
- Subsequent variants increment by 1
314
+
- Explicit discriminant indices are currently not supported
315
+
- Discriminant is always represented as a `u32` during serialization. See [Discriminant Representation](#discriminant-representation) for more details.
316
+
- Maintains the original enum variant semantics during encoding
317
+
318
+
### Variant Payload Encoding
319
+
320
+
- Tuple variants: Fields serialized in declaration order
321
+
- Struct variants: Fields serialized in declaration order
322
+
- Unit variants: No additional data encoded
323
+
324
+
### Discriminant Representation
325
+
326
+
- Always encoded as a `u32`
327
+
- Encoding method depends on the configured `IntEncoding`
328
+
-`VarintEncoding`: Variable-length encoding
329
+
-`FixintEncoding`: Fixed 4-byte representation
330
+
331
+
### Handling of Variant Payloads
332
+
333
+
- Payload is serialized immediately after the discriminant
334
+
- No additional metadata about field names or types
335
+
- Payload structure matches the variant's definition
0 commit comments