-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge bulk lookup proofs #163
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's close but we need to fix the mysql metrics gathering and I think there's more optimizations that are possible for batch lookups. But the second part could be pushed to a future PR.
let mut lookup_proofs = Vec::new(); | ||
for i in 0..unames.len() { | ||
lookup_proofs.push( | ||
self.lookup_with_info::<H>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the info? I think if we've done the proper BFS loading, we should only need to batch-get the value states in order to load them into the cache, then we can do the regular proof generation operations since we'll only be accessing from local memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to eliminate duplicate get_user_state
queries. One in identifying which labels are needed and one in (previously) lookup for the same purpose. This way pre-loaded info is passed to the lookup function, so no re-loads.
Looking at the code for MySQL's get_user_state
, are we missing looking up the queried state in cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, yes we've excluded get_user_state
from the caching logic because of the filter parameters. How would you know a cached value matches "<= epoch" or some weird filter without going to the backing store? It sounds like we might need a new query at the storage layer to retrieve the max versions for a collection of users which is <= a given epoch. (If I'm remembering right). We could move forward with this however and simply open an issue for it, but we'll want to do it properly for batch proof generation. Here's we will hit order N queries just to gather the user states, while not awful, still not ideal when we can do a better query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great, let's merge this in!
This PR adds support for bulk lookup proofs (See Issue #103).
First, for each lookup proof requested, necessary labels are calculated. Then, the nodes corresponding to their self- and sibling-prefixes are preloaded before the lookup operations (See the comments above the function
build_lookup_prefixes_set
for my understanding why this is the case).To test it, I added a new feature to the MySQL implementation where each storage call is tracked with its self-(
read
/write
) and data-type (e.g.,HistoryTreeNode
). Here is its output with the following parameters:Note the reduction in completion time (about 40% but may vary depending on the parameters) and
get_direct
calls replaced withbatch_get
s in bulk lookups.This code was a bit tricky to test since MySQL metrics are reset when printed and nodes in cache may change the resulting numbers. In addition, the users looked up might result in different time measurements. I tried and accounted for all that but if you notice any issues, please let me know!