Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include the index type in the index versioning information #263

Closed
3 tasks done
jeltsch opened this issue Jun 21, 2024 · 0 comments · Fixed by #363
Closed
3 tasks done

Include the index type in the index versioning information #263

jeltsch opened this issue Jun 21, 2024 · 0 comments · Fixed by #363
Assignees
Labels
enhancement New feature or request

Comments

@jeltsch
Copy link
Collaborator

jeltsch commented Jun 21, 2024

The serialization formats of both the compact and the ordinary index reserve the first 32 bits for storing the version of the index. Currently, we number versions of the compact and the ordinary index independently, which means that the type of index, compact or ordinary, cannot be detected from those first 32 bits. However, there is also no other part of a serialized index that would reliably tell its type. To get out of this unfortunate situation, @joris and I propose to use the first 32 bits of a serialized index also to indicate its type.

So far, we’ve been using the serialized version number also to capture the endianness used by a serialized index, which has been made possible by picking version numbers appropriately. It is important to preserve the ability to detect the endianness when including information about the index type. To this end, I propose the following structure for the 32-bit number to be stored at the beginning of an index:

  • Bits 0–7 contain the actual version number.
  • Bits 8–15 contain a number denoting the type of the index.
  • Bits 16–31 are zero.

Here, bit numbers refer to significance. To put things differently, the initial 32-bit number can be written in hexadecimal with 4-digits of which the first two denote the type of the index and the last two denote the actual version number. If version numbers are never zero, this structure guarantees that little-endian and big-endian can always be told apart. Mixed-endian could still cause problems, but my understanding is that virtually no architecture in use today supports this kind of endianness.

I furthermore propose the following encodings of index types:

  • 0 for the compact index
  • 1 for the ordinary index

With this scheme, nothing has to be changed for the compact index, while the ordinary index must have its version number 1 changed to the type–version indicator 0101h.

If this approach of storing the index type and the index version is adopted, we should perform the following tasks:

  • Adapt the code of the ordinary index as described above
  • Rename indexVersion in the code of the compact index such that its new, expanded meaning is expressed
  • Document this approach
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant