Pre-hash rows and add SmallHashSets #318
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of Changes
Annotate rows with their hashes. This is halfway to the old solution of annotating code with their serialized forms; it speeds up insertion into
BTreeIndex
es while saving a byte array allocation per-row.(Wait, why do we hash when inserting into
BTreeIndex
es, you ask? It's because we use aHashSet
to store the multiple rows corresponding to a non-unique key.)This saves work on the main thread repeatedly hashing values. Once we parallelize message processing, this work will be spread out over multiple threads too.
Also adds a data type
SmallHashSet<T>
for use in BTreeIndexes. This is a struct that can store at most one element without allocating. This gives dramatic performance improvements on initial connection -- it seems that most rows in BTreeIndexes don't have any other rows with the same key. So, skipping all allocations in this case gives very good performance.API
Requires SpacetimeDB PRs
Testsuite
SpacetimeDB branch name: master
Testing
I will test: