-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle large indexes better #106
Comments
In my own implementation of indexing (which also supports multi-column indexing), for unique indexes I use this format:
PS. COLs are serialized in fixed width format so everything is nicely sorted lexicographically. Mostly as raw []bytes, or hex strings. Strings require some extra care due to their variable length size. I use this form, because it is trivial to detect unique constraint collisions without doing a range scan, and automatically will be detected by For non-unique indices I use this format:
(For non-unique indices I even disable the value fetching in badger iterators, as value is irrelevant) When I do a point scan, I do a iteration from This also allows me to utilize prefix-scans to do range scans using partial column information over the index (i.e. leading equalities, and optional one trailing range). And there is no limit of how many items are there for a given secondary index key, it will scale, and is efficient to add or remove items from it. The long prefix is also not an issue, due to key prefix compression in
|
See #105
The cost of inserting into an index grows as the number of values an index key refers to grows. This shouldn't be the case. The cost should be the same if your inserting your 1st value or your millionth.
Indexes will need to be able to split into separate key ranges as they grow.
This will need a new index storage format, so I'll need to continue to support the old format and use the new one going forward if possible.
The text was updated successfully, but these errors were encountered: