-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add function to efficiently hash entire non-root subtrees to guts #329
base: master
Are you sure you want to change the base?
Conversation
provides an efficient way to compute the hash for multiple chunks that do not start at chunk 0, as part of a larger tree. This is useful for crates like bao, abao and bao-tree, especially when using chunk groups. Adding this required to disable some assertions that contained assumptions that are no longer true.
I am not sure what is the best way to expose the required method to guts. It does not matter as long as there is a way to hash a non-root subtree that starts at a non-zero chunk. |
src/lib.rs
Outdated
@@ -1246,7 +1267,7 @@ impl Hasher { | |||
// also. Convert it directly into an Output. Otherwise, we need to | |||
// merge subtrees below. | |||
if self.cv_stack.is_empty() { | |||
debug_assert_eq!(self.chunk_state.chunk_counter, 0); | |||
// debug_assert_eq!(self.chunk_state.chunk_counter, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must be disabled because the chunk counter can be set to a non zero value in the constructor now.
src/lib.rs
Outdated
self.chunk_state.chunk_counter.count_ones() as usize, | ||
"cv stack does not need a merge" | ||
); | ||
// debug_assert_eq!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This must be disabled since because the chunk counter does not necessarily start at 0, it's current value is no longer equivalent to the number of chunks.
I don't think there is a way to reenable this except by adding an additional value to the state that tracks the start chunk.
@@ -1304,6 +1325,11 @@ impl Hasher { | |||
self.final_output().root_hash() | |||
} | |||
|
|||
fn finalize_node(&self, is_root: bool) -> Hash { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just exposes a way to create non root hashes
src/lib.rs
Outdated
@@ -409,6 +409,18 @@ impl Output { | |||
Hash(platform::le_bytes_from_words_32(&cv)) | |||
} | |||
|
|||
fn hash(&self, is_root: bool) -> Hash { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just exposes a way to create non root hashes
and add optional rayon support
I am not quite sure, but it seems that what this blake3 fork is doing would also be covered with this: fleek-network@0e15418 |
ecd9b41
to
a050f34
Compare
also add a little brute force test
I'm out of the country and mostly offline until August 20, so I won't be able to take a proper look until then. But you might be interested in the new "guts" API I'm in the middle of:
Apologies for the big pile of undocumented code there :) But my hope is that I can port BLAKE3's existing SIMD implementations to that expanded API and then release that as a (permanently unstable) crate that other low-level callers can depend on. Here are a couple other projects I hope to support with the same API: |
@oconnor663-travel this looks awesome. No stress, we have forked the crate and are depending on that one now, so there is not much urgency. Once a crate is out that allows us to compute subtree hashes efficiently, we will yank that one and go back to upstream. Happy to close this if there is something better in the works. It would be nice if there was an example or a fn for the hash_subtree use case implemented in this PR. Enjoy being offline. |
provides an efficient way to compute the hash for multiple chunks that do not start at chunk 0, as part of a larger tree.
This is useful for crates like bao, abao and bao-tree when using chunk groups.
Adding this required to disable some assertions that contained assumptions that are no longer true.
This gives a performance improvement of 4x when computing an outboard with a chunk group size of 16kb in bao-tree compared to using
blake3::guts::ChunkState
andblake3::guts::parent_cv
to compute a hash for a chunk group.UPDATE:
this works in the case of bao-tree, but the signature is not good since it can't work in the general case. E.g.
This does not make any sense at all, since there is no single root for chunks 1 and 2. I guess you could put in an assert to catch things like this?