-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: state sync optimization #346
Conversation
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe changes introduce enhancements to GroveDB's state synchronization and storage management system. The modifications focus on improving subtree handling, introducing new methods for opening Merks by prefix, and refactoring the replication process. Key improvements include session-based state synchronization, more flexible storage context retrieval, and expanded methods for managing database operations across different contexts. Changes
Sequence DiagramsequenceDiagram
participant Client
participant GroveDB
participant StateSyncSession
participant StorageContext
Client->>GroveDB: start_syncing_session()
GroveDB->>StateSyncSession: Create new session
Client->>GroveDB: fetch_chunk()
GroveDB->>StorageContext: Get context by prefix
StorageContext-->>GroveDB: Return storage context
GroveDB->>StateSyncSession: Apply chunk
StateSyncSession-->>GroveDB: Process chunk
Client->>GroveDB: commit_session()
GroveDB->>StateSyncSession: Finalize synchronization
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (12)
tutorials/src/bin/replication.rs (3)
21-22
: UnifySubtreePrefix
type definition
You defineSubtreePrefix
here as[u8; blake3::OUT_LEN]
, but the same alias appears elsewhere. Consider unifying it into one central definition to avoid duplication across modules.
104-104
: Avoid panics by handling errors
Using.unwrap()
here will panic on error. In production scenarios, consider replacing.unwrap()
with the?
operator or custom error handling to provide more graceful error reporting.
246-270
: Enhance logging and session commit flow
start_time
usage is good for measuring sync duration. Consider adopting a structured logging approach in place ofprintln!
for better production adaptability.- Tracking
num_chunks
is helpful. You might also record chunk sizes to diagnose throughput or memory usage.commit_session
silently discards potential costs or errors. If future versions return aResult
, ensure to handle or log it.grovedb/src/replication.rs (5)
25-27
: Lack of error handling in commit_session
commit_session
swallows the cost result. If in future the commit operation can fail, consider returning aResult
to propagate potential errors.
49-49
: Revised transaction parameter
Renamingtx
totransaction
improves consistency with the rest of the code.
68-70
: Consider smaller helper method
This block might be refactored into a helper, e.g.open_merk_for_chunk(...)
, for improved readability if it grows further.
225-225
: Reduce type complexity
Clippy warns about the return signature inutil_split_global_chunk_id
. Consider using a dedicated struct or alias to clarify meaning and reduce complexity:🧰 Tools
🪛 GitHub Check: clippy
[warning] 225-225: very complex type used. Consider factoring parts into
type
definitions
warning: very complex type used. Consider factoring parts intotype
definitions
--> grovedb/src/replication.rs:225:6
|
225 | ) -> Result<(crate::SubtreePrefix, Option<Vec>, bool, Vec), Error> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#type_complexity
= note:#[warn(clippy::type_complexity)]
on by default
214-214
: Optimize string conversion
Clippy notes that borrowing&subtree
is immediately dereferenced. Also, you can replaceunwrap_or_else
withunwrap_or
for a constant fallback string.- let string = std::str::from_utf8(&subtree).unwrap_or_else(|_| "<NON_UTF8_PATH>"); + let string = std::str::from_utf8(subtree).unwrap_or("<NON_UTF8_PATH>");🧰 Tools
🪛 GitHub Check: clippy
[warning] 214-214: this expression creates a reference which is immediately dereferenced by the compiler
warning: this expression creates a reference which is immediately dereferenced by the compiler
--> grovedb/src/replication.rs:214:42
|
214 | let string = std::str::from_utf8(&subtree).unwrap_or_else(|_| "<NON_UTF8_PATH>");
| ^^^^^^^^ help: change this to:subtree
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_borrow
[warning] 214-214: unnecessary closure used to substitute value for
Result::Err
warning: unnecessary closure used to substitute value forResult::Err
--> grovedb/src/replication.rs:214:22
|
214 | let string = std::str::from_utf8(&subtree).unwrap_or_else(|_| "<NON_UTF8_PATH>");
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_lazy_evaluations
= note:#[warn(clippy::unnecessary_lazy_evaluations)]
on by default
help: useunwrap_or
instead
|
214 | let string = std::str::from_utf8(&subtree).unwrap_or("<NON_UTF8_PATH>");
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~grovedb/src/replication/state_sync_session.rs (3)
43-43
: Elide explicit lifetimes
The'db
lifetime inimpl<'db> SubtreeStateSyncInfo<'db>
could potentially be elided for clarity if no advanced lifetime constraints are required.🧰 Tools
🪛 GitHub Check: clippy
[warning] 43-43: the following explicit lifetimes could be elided: 'db
warning: the following explicit lifetimes could be elided: 'db
--> grovedb/src/replication/state_sync_session.rs:43:6
|
43 | impl<'db> SubtreeStateSyncInfo<'db> {
| ^^^ ^^^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_lifetimes
help: elide the lifetimes
|
43 - impl<'db> SubtreeStateSyncInfo<'db> {
43 + impl SubtreeStateSyncInfo<'_> {
|
263-263
: Avoid cloningbool
s
You’re calling.clone()
onis_sum_tree
which implementsCopy
. Removing.clone()
is more concise and avoids minor overhead.- subtree_state_sync.is_sum_tree.clone(), + subtree_state_sync.is_sum_tree,🧰 Tools
🪛 GitHub Check: clippy
[warning] 263-263: using
clone
on typebool
which implements theCopy
trait
warning: usingclone
on typebool
which implements theCopy
trait
--> grovedb/src/replication/state_sync_session.rs:263:21
|
263 | subtree_state_sync.is_sum_tree.clone(),
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: try removing theclone
call:subtree_state_sync.is_sum_tree
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#clone_on_copy
= note:#[warn(clippy::clone_on_copy)]
on by default
406-408
: Avoid cloning fixed-size arrays
Clippy warns about cloning[u8; 32]
. As these areCopy
, simply dereference instead of cloning:- elem_value_hash.clone(), - actual_value_hash.clone(), - prefix.clone(), + *elem_value_hash, + *actual_value_hash, + *prefix,🧰 Tools
🪛 GitHub Check: clippy
[warning] 408-408: using
clone
on type[u8; 32]
which implements theCopy
trait
warning: usingclone
on type[u8; 32]
which implements theCopy
trait
--> grovedb/src/replication/state_sync_session.rs:408:21
|
408 | prefix.clone(),
| ^^^^^^^^^^^^^^ help: try dereferencing it:*prefix
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#clone_on_copy
[warning] 407-407: using
clone
on type[u8; 32]
which implements theCopy
trait
warning: usingclone
on type[u8; 32]
which implements theCopy
trait
--> grovedb/src/replication/state_sync_session.rs:407:26
|
407 | Some(actual_value_hash.clone()),
| ^^^^^^^^^^^^^^^^^^^^^^^^^ help: try dereferencing it:*actual_value_hash
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#clone_on_copy
[warning] 406-406: using
clone
on type[u8; 32]
which implements theCopy
trait
warning: usingclone
on type[u8; 32]
which implements theCopy
trait
--> grovedb/src/replication/state_sync_session.rs:406:21
|
406 | elem_value_hash.clone(),
| ^^^^^^^^^^^^^^^^^^^^^^^ help: try dereferencing it:*elem_value_hash
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#clone_on_copygrovedb/src/lib.rs (1)
384-391
: High complexity return type
Clippy’s warning suggests factoring out(Merk<PrefixedRocksDbImmediateStorageContext<'tx>>, Option<Vec<u8>>, bool)
into a named type or struct for clarity:🧰 Tools
🪛 GitHub Check: clippy
[warning] 384-391: very complex type used. Consider factoring parts into
type
definitions
warning: very complex type used. Consider factoring parts intotype
definitions
--> grovedb/src/lib.rs:384:10
|
384 | ) -> Result<
| __________^
385 | | (
386 | | Merk<PrefixedRocksDbImmediateStorageContext<'tx>>,
387 | | Option<Vec>,
... |
390 | | Error,
391 | | >
| |_____^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#type_complexity
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
grovedb/src/lib.rs
(5 hunks)grovedb/src/replication.rs
(3 hunks)grovedb/src/replication/state_sync_session.rs
(1 hunks)storage/src/rocksdb_storage/storage.rs
(4 hunks)storage/src/storage.rs
(4 hunks)tutorials/Cargo.toml
(1 hunks)tutorials/src/bin/replication.rs
(5 hunks)
🧰 Additional context used
🪛 GitHub Check: clippy
grovedb/src/lib.rs
[warning] 384-391: very complex type used. Consider factoring parts into type
definitions
warning: very complex type used. Consider factoring parts into type
definitions
--> grovedb/src/lib.rs:384:10
|
384 | ) -> Result<
| __________^
385 | | (
386 | | Merk<PrefixedRocksDbImmediateStorageContext<'tx>>,
387 | | Option<Vec>,
... |
390 | | Error,
391 | | >
| |_____^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#type_complexity
grovedb/src/replication/state_sync_session.rs
[warning] 408-408: using clone
on type [u8; 32]
which implements the Copy
trait
warning: using clone
on type [u8; 32]
which implements the Copy
trait
--> grovedb/src/replication/state_sync_session.rs:408:21
|
408 | prefix.clone(),
| ^^^^^^^^^^^^^^ help: try dereferencing it: *prefix
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#clone_on_copy
[warning] 407-407: using clone
on type [u8; 32]
which implements the Copy
trait
warning: using clone
on type [u8; 32]
which implements the Copy
trait
--> grovedb/src/replication/state_sync_session.rs:407:26
|
407 | Some(actual_value_hash.clone()),
| ^^^^^^^^^^^^^^^^^^^^^^^^^ help: try dereferencing it: *actual_value_hash
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#clone_on_copy
[warning] 406-406: using clone
on type [u8; 32]
which implements the Copy
trait
warning: using clone
on type [u8; 32]
which implements the Copy
trait
--> grovedb/src/replication/state_sync_session.rs:406:21
|
406 | elem_value_hash.clone(),
| ^^^^^^^^^^^^^^^^^^^^^^^ help: try dereferencing it: *elem_value_hash
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#clone_on_copy
[warning] 263-263: using clone
on type bool
which implements the Copy
trait
warning: using clone
on type bool
which implements the Copy
trait
--> grovedb/src/replication/state_sync_session.rs:263:21
|
263 | subtree_state_sync.is_sum_tree.clone(),
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: try removing the clone
call: subtree_state_sync.is_sum_tree
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#clone_on_copy
= note: #[warn(clippy::clone_on_copy)]
on by default
[warning] 43-43: the following explicit lifetimes could be elided: 'db
warning: the following explicit lifetimes could be elided: 'db
--> grovedb/src/replication/state_sync_session.rs:43:6
|
43 | impl<'db> SubtreeStateSyncInfo<'db> {
| ^^^ ^^^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_lifetimes
help: elide the lifetimes
|
43 - impl<'db> SubtreeStateSyncInfo<'db> {
43 + impl SubtreeStateSyncInfo<'_> {
|
grovedb/src/replication.rs
[warning] 225-225: very complex type used. Consider factoring parts into type
definitions
warning: very complex type used. Consider factoring parts into type
definitions
--> grovedb/src/replication.rs:225:6
|
225 | ) -> Result<(crate::SubtreePrefix, Option<Vec>, bool, Vec), Error> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#type_complexity
= note: #[warn(clippy::type_complexity)]
on by default
[warning] 214-214: this expression creates a reference which is immediately dereferenced by the compiler
warning: this expression creates a reference which is immediately dereferenced by the compiler
--> grovedb/src/replication.rs:214:42
|
214 | let string = std::str::from_utf8(&subtree).unwrap_or_else(|_| "<NON_UTF8_PATH>");
| ^^^^^^^^ help: change this to: subtree
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_borrow
[warning] 214-214: unnecessary closure used to substitute value for Result::Err
warning: unnecessary closure used to substitute value for Result::Err
--> grovedb/src/replication.rs:214:22
|
214 | let string = std::str::from_utf8(&subtree).unwrap_or_else(|_| "<NON_UTF8_PATH>");
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_lazy_evaluations
= note: #[warn(clippy::unnecessary_lazy_evaluations)]
on by default
help: use unwrap_or
instead
|
214 | let string = std::str::from_utf8(&subtree).unwrap_or("<NON_UTF8_PATH>");
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
🔇 Additional comments (20)
tutorials/src/bin/replication.rs (3)
3-3
: Use Duration
import or remove it if unused
If you plan on using Duration
for performance measurements beyond Instant
, that's great. Otherwise, consider removing the unused import to keep the code clean.
42-42
: Verify performance impact of increased range
Expanding the loop from 5 to 100 can significantly increase inserted records (up to 5050 items). This is helpful for testing larger datasets but may slow down tutorial runs. Ensure that these higher ranges won’t overly impact performance or memory usage.
49-49
: Confirm consistency with prior loop changes
This loop mirrors the previous one, inserting the same expanded range of values. If this duplication of data is intentional, no changes are needed.
grovedb/src/replication.rs (4)
1-2
: Module import clarity
The new mod state_sync_session;
integrates well. Ensure any future additions to state_sync_session
remain consistent with overarching replication design.
21-23
: Session-based sync approach
Introducing start_syncing_session
is a clear improvement toward session-based replication. Be mindful of transaction boundaries in concurrent scenarios.
33-44
: Document newly introduced chunk ID format
This doc comment thoroughly describes [SUBTREE_PREFIX:SIZE_ROOT_KEY:ROOT_KEY:IS_SUM_TREE:CHUNK_ID]
. Ensure all call sites and maintainers reference this properly for clarity.
64-66
: Validate root hash usage
root_app_hash
is used to detect if global_chunk_id
matches the root. Confirm that upstream logic always provides a valid application hash.
grovedb/src/lib.rs (4)
184-185
: Confirm ChunkProducer
usage
Ensure that the newly imported ChunkProducer
is invoked where intended, given that replicated chunk retrieval logic now resides in replication.rs
.
222-223
: Check usage of util_encode_vec_ops
Since util_encode_vec_ops
is central to chunk serialization, confirm it’s used consistently throughout replication code to avoid diverging chunk formats.
337-376
: Open transactional Merk by prefix
The approach is logical: you retrieve a specialized storage context from get_transactional_storage_context_by_subtree_prefix
. Verify that concurrency handling is consistent with open Merk usage in other methods.
520-557
: Open non-transactional Merk by prefix
Similar to the transactional version, ensure the logic for root key checks and error handling remains parallel to avoid divergence between the two flows.
storage/src/storage.rs (4)
46-47
: Type alias alignment
pub type SubtreePrefix = [u8; blake3::OUT_LEN];
is consistent with your usage; however, confirm you aren’t duplicating this alias in multiple places. If you are, consider referencing it from one source.
94-101
: Context creation by subtree prefix
Providing a batch-ready storage context by prefix is a welcome addition. If concurrency or partial writes become more complex, consider separate modules or methods for clarity.
113-121
: Transactional context by subtree prefix
This extension mirrors the batch context style. Keep an eye on error-handling consistency around partial commits or rollbacks for the prefix-based approach.
132-139
: Immediate storage context by prefix
Ensure that replicators or direct-writers supply consistent prefix
values so that separate subtrees don’t accidentally overlap.
storage/src/rocksdb_storage/storage.rs (4)
61-61
: Elevating SubtreePrefix
to pub
scope looks good
It aligns with the need to share prefix creation across the codebase, but make sure that no inadvertent external usage is introduced without careful access control or documentation.
475-482
: Consider adding cost-based validation for imported prefixes
In get_storage_context_by_subtree_prefix
, the prefix is assumed to be correct and the method wraps it with a default cost. If the prefix is invalid or empty, there’s no direct error handling.
498-506
: Transactional context creation is consistent
The logic for building a prefixed transactional context by subtree prefix is coherent. Ensure that error pathways are considered if prefix usage eventually fails.
521-528
: Immediate context creation is consistent, but watch for concurrency
When retrieving an immediate storage context by prefix, ensure that concurrent modifications to the same subtree prefix or transaction references do not cause issues.
tutorials/Cargo.toml (1)
17-17
: Dependency on blake3
is properly introduced
This aligns with the usage in SubtreePrefix
definitions. Double-check for any potential version conflicts in other crates referencing blake3
.
47374aa
to
995aec7
Compare
return Ok(vec![]); | ||
} | ||
|
||
let completed_path = subtree_state_sync.get_current_path(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like this should return a reference instead of owned vec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one seems to be not possible. As it will violate the borrowing rule: we already have a mutable reference to subtree_state_sync a few lines above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big fan of Pin, but I tried to do it without pinning myself and realized it would take a lot of time. So this is good enough if it works.
Issue being fixed or feature implemented
This PR includes optimizations for state sync.
It includes also all fixes from #300.
What was done?
The major improvments made are:
SUBTREE_PREFIX:SIZE_ROOT_KEY:ROOT_KEY:IS_SUM_TREE:CHUNK_ID
allowing fetch_chunk function (on source side) to open directly the corresponding merkHow Has This Been Tested?
Tested with tutorials replication and in Platform strategy test.
Breaking Changes
No
Checklist:
For repository code-owners and collaborators only
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Cargo.toml
to include theblake3
crate dependency.Refactor