Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Range encoding vs bitfield encoding could be more optimal #35

Open
chrispaterson opened this issue Dec 13, 2018 · 0 comments
Open

Range encoding vs bitfield encoding could be more optimal #35

chrispaterson opened this issue Dec 13, 2018 · 0 comments

Comments

@chrispaterson
Copy link
Collaborator

chrispaterson commented Dec 13, 2018

The encoding should first encode the bitfield, because that will always need to be available because of the "parsedVendorConsents" api we expose. But when we should determine if range encoding would be smaller by using some math to calculate ahead of beginning the range encoding.

Consent string fields

Range encoding constant sunk size cost:

1 bit for default consent
12 bits for num entries
13 total bits

Each range size cost:

1 bit for SingleOrRange[idx]
16 bits for SingleVendorId[idx] or StartVendorId[idx]
16 bits (if range) for EndVendorId[idx]
33 total bits or 17 total bits for single

Therefore, a range encoding, in the worst case (all sub-ranges and no singles), can be expressed as:

f(n) = 33n + 13; where n is the number of gaps and points where consent changes from the complement to the default and default to compliment. Any case would create a new range.

With a bitfield encoding, we know there will be a constant number of bits equal to the maxVendorId. The equation for determining which encoding to execute to minimize the size of the final encoding would be:

f(n) < maxVendorId; then range encode
f(n) >= maxVendorId; then bitfield

It would be possible to begin counting gaps in the vendorlist and consent value changes from default and run that through f(n) on each loop until that number is greater than maxVendorId and then break the loop and abandon the range encoding.

On vendorlist version 125 (latest) I found 67 gaps in 460 vendors. f(67) = 2224 which is far greater than the 560 (maxVendorId) that would have been allocated if it were a bitfield encoding. It would be a shame to spend the time to encode that range encoded string and then just throw it out when we can know ahead of time with some math if it will be too large.

With this vendorlist, assuming default consent is 0 and each vendor has a 1, we could only encode 16 range sections before the encoding length would be longer than the bitfield.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant