Skip to content

Range encoding vs bitfield encoding could be more optimal #35

Open
@chrispaterson

Description

@chrispaterson

The encoding should first encode the bitfield, because that will always need to be available because of the "parsedVendorConsents" api we expose. But when we should determine if range encoding would be smaller by using some math to calculate ahead of beginning the range encoding.

Consent string fields

Range encoding constant sunk size cost:

1 bit for default consent
12 bits for num entries
13 total bits

Each range size cost:

1 bit for SingleOrRange[idx]
16 bits for SingleVendorId[idx] or StartVendorId[idx]
16 bits (if range) for EndVendorId[idx]
33 total bits or 17 total bits for single

Therefore, a range encoding, in the worst case (all sub-ranges and no singles), can be expressed as:

f(n) = 33n + 13; where n is the number of gaps and points where consent changes from the complement to the default and default to compliment. Any case would create a new range.

With a bitfield encoding, we know there will be a constant number of bits equal to the maxVendorId. The equation for determining which encoding to execute to minimize the size of the final encoding would be:

f(n) < maxVendorId; then range encode
f(n) >= maxVendorId; then bitfield

It would be possible to begin counting gaps in the vendorlist and consent value changes from default and run that through f(n) on each loop until that number is greater than maxVendorId and then break the loop and abandon the range encoding.

On vendorlist version 125 (latest) I found 67 gaps in 460 vendors. f(67) = 2224 which is far greater than the 560 (maxVendorId) that would have been allocated if it were a bitfield encoding. It would be a shame to spend the time to encode that range encoded string and then just throw it out when we can know ahead of time with some math if it will be too large.

With this vendorlist, assuming default consent is 0 and each vendor has a 1, we could only encode 16 range sections before the encoding length would be longer than the bitfield.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions