Skip to content

Latest commit

 

History

History
108 lines (87 loc) · 4.07 KB

README.md

File metadata and controls

108 lines (87 loc) · 4.07 KB

shrink

A Lean Data Exchange Format for the Web

FORMAT

Shrink data is encoded in little endian because most modern architectures support it.

The content can be made up of any number of values, key-value pairs and nested structures.

Each value is preceded by a type.

Nested structures are enclosed in delimiters. Array structurs contain a sequence of values, while map structures contain key-value pairs separated key_value delimiters.

Strings are null terminated like C strings. (Probably a bad idea)

TYPES

- int     - 4 bits - `1`  - `0x1`
- sint    - 4 bits - `2`  - `0x3`
- float32 - 4 bits - `3`  - `0x4`
- float64 - 4 bits - `4`  - `0x5`
- ascii   - 4 bits - `5`  - `0x6`
- utf-8   - 4 bits - `6`  - `0x7`

DELIMITERS

- key_value  - 4 bits - `15` - `0xF`
- nest_open  - 4 bits - `14` - `0xE`
- nest_close - 4 bits - `13` - `0xD`

VALUES

- unsigned integers - variable - LEB128 encoding
- signed integers   - variable - Custom LEB128 encoding
- floats            - fixed    - IEEE 754 encoding
- string            - variable - Custom prefixVarint encoding
ASCII
0xxxxxxx 0xxxxxxx 0xxxxxxx 0xxxxxxx -> 0110 xxxxxxxx xxxxxxxx xxxxxxxx xxxx

UTF-8
0xxxxxxx                            -> 0111 00xxxxxx x.......
110xxxxx 10xxxxxx                   -> 0111 01xxxxxx xxxxx...
1110xxxx 10xxxxxx 10xxxxxx          -> 0111 10xxxxxx xxxxxxxx xx......
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx -> 0111 11xxxxxx xxxxxxxx xxxxxxx.

OPTIMIZATIONS

  • 64-bit integers have an implicit stop because Shrink doesn't expect integers larger than 64-bit. And this allows shrink to use 9 bytes for full 64-bit integers instead of 10.

  • Removing octet redundancy like in VLQ encoding

  • Signed integers use the same LEB128 conversion as the unsigned integers so there is no need for two's complement operations. The only consideration is sign extending the compressed value.

SIMPLE EXAMPLE

{
    "name": "James Bourdelon",
    "age": 42,
    "height": 170.688,
    "favorite_quotes": {
        "Walt Disney": "The way to get started is to quit talking and begin doing",
        "Anne Frank": "Whoever is happy will make others happy too"
    },
    "spoken_languages": [
        "English",
        "German"
    ]
}

STALE (Some improvements have been made)

0x01 0x73 # 1 s (Preamble)
0x_F 0x_A 0x6E 0x61 0x6D 0x65 0x00 # name:
0x_A 0x4a 0x61 0x6d 0x65 0x73 0x20 0x42 0x6f 0x75 0x72 0x64 0x65 0x6c 0x6f 0x6e 0x00 # "James Bourdelon"
0x_F 0x_A 0x61 0x67 0x65 0x00 # age:
0x_0 0x2A # 42
0x_F 0x_A 0x68 0x65 0x69 0x67 0x68 0x74 0x00 # height:
0x_8 0x43 0x2A 0xB0 0x21 # 170.688
0x_A 0x66 0x61 0x76 0x6f 0x72 0x69 0x74 0x65 0x5f 0x71 0x75 0x6f 0x74 0x65 0x73 0x00 # favorite_quotes:
0x_E # {
0x_F 0x_A 0x57 0x61 0x6c 0x74 0x20 0x44 0x69 0x73 0x6e 0x65 0x79 0x0E 0x00 # "Walt Disney":
0x_A 0x54 0x68 0x65 0x20 0x77 0x61 0x79 0x20 0x74 0x6f 0x20 0x67 0x65 0x74 0x20 0x73 0x74 0x61 0x72 0x74 0x65 0x64 0x20 0x69 0x73 0x20 0x74 0x6f 0x20 0x71 0x75 0x69 0x74 0x20 0x74 0x61 0x6c 0x6b 0x69 0x6e 0x67 0x20 0x61 0x6e 0x64 0x20 0x62 0x65 0x67 0x69 0x6e 0x20 0x64 0x6f 0x69 0x6e 0x67 0x00 # "The way to get started is to quit talking and begin doing"
0x_F 0x_A 0x41 0x6e 0x6e 0x65 0x20 0x46 0x72 0x61 0x6e 0x6b 0x00 # "Anne Frank":
0x_A 0x57 0x68 0x6f 0x65 0x76 0x65 0x72 0x20 0x69 0x73 0x20 0x68 0x61 0x70 0x70 0x79 0x20 0x77 0x69 0x6c 0x6c 0x20 0x6d 0x61 0x6b 0x65 0x20 0x6f 0x74 0x68 0x65 0x72 0x73 0x20 0x68 0x61 0x70 0x70 0x79 0x20 0x74 0x6f 0x6f 0x00 # "Whoever is happy will make others happy too"
0x_D # }
0x_A 0x73 0x70 0x6f 0x6b 0x65 0x6e 0x5f 0x6c 0x61 0x6e 0x67 0x75 0x61 0x67 0x65 0x73 0x00 # spoken_languages
0x_E # [
0x_A 0x45 0x6e 0x67 0x6c 0x69 0x73 0x68 0x00 # "English"
0x_A 0x47 0x65 0x72 0x6d 0x61 0x6e 0x00 # "German"
0x_D # ]

For this contrived example, the JSON version (without spaces) takes 246 bytes while Shrink takes 227 saving only 19 bytes.

TODOs

  • Conversion from shrink to JSON
  • Conversion from JSON to shrink
  • Deserializing from shrink to object
  • Serializing from object to shrink