Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Epoch Sync] Initial implementation for Epoch Sync V4. #11934

Merged
merged 6 commits into from
Aug 22, 2024

Conversation

robin-near
Copy link
Contributor

Remaining issues:

near/near-one-project-tracking#73

I still need to go through this one more time, but reviews can be done in the meantime.

Copy link

codecov bot commented Aug 14, 2024

Codecov Report

Attention: Patch coverage is 68.06723% with 152 lines in your changes missing coverage. Please review.

Project coverage is 71.52%. Comparing base (7d0e3d7) to head (b7e4906).
Report is 1 commits behind head on master.

Files Patch % Lines
chain/client/src/sync/epoch.rs 68.31% 40 Missing and 56 partials ⚠️
chain/epoch-manager/src/lib.rs 48.48% 5 Missing and 12 partials ⚠️
chain/client-primitives/src/types.rs 0.00% 11 Missing ⚠️
chain/network/src/network_protocol/mod.rs 0.00% 7 Missing ⚠️
chain/client/src/sync/header.rs 77.77% 2 Missing and 4 partials ⚠️
core/primitives/src/epoch_sync.rs 0.00% 4 Missing ⚠️
chain/client/src/info.rs 0.00% 3 Missing ⚠️
chain/chain/src/test_utils/kv_runtime.rs 0.00% 2 Missing ⚠️
chain/network/src/rate_limits/messages_limits.rs 0.00% 2 Missing ⚠️
tools/chainsync-loadtest/src/network.rs 0.00% 2 Missing ⚠️
... and 2 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #11934      +/-   ##
==========================================
+ Coverage   71.49%   71.52%   +0.03%     
==========================================
  Files         810      812       +2     
  Lines      164037   164452     +415     
  Branches   164037   164452     +415     
==========================================
+ Hits       117283   117631     +348     
+ Misses      41671    41670       -1     
- Partials     5083     5151      +68     
Flag Coverage Δ
backward-compatibility 0.23% <0.00%> (-0.01%) ⬇️
db-migration 0.23% <0.00%> (-0.01%) ⬇️
genesis-check 1.34% <3.16%> (+<0.01%) ⬆️
integration-tests 38.51% <67.85%> (+0.10%) ⬆️
linux 71.27% <68.06%> (+0.02%) ⬆️
linux-nightly 71.11% <68.06%> (+0.05%) ⬆️
macos 54.24% <15.54%> (-0.11%) ⬇️
pytests 1.60% <3.16%> (+<0.01%) ⬆️
sanity-checks 1.41% <3.16%> (+<0.01%) ⬆️
unittests 65.46% <15.54%> (-0.14%) ⬇️
upgradability 0.27% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@staffik staffik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!
More tests could be added as a follow up task.

chain/client/src/client_actor.rs Outdated Show resolved Hide resolved
chain/client/src/sync/epoch.rs Show resolved Hide resolved
chain/client/src/sync/epoch.rs Outdated Show resolved Hide resolved
chain/client/src/sync/epoch.rs Show resolved Hide resolved
chain/client/src/sync/epoch.rs Show resolved Hide resolved
chain/epoch-manager/src/lib.rs Show resolved Hide resolved
chain/network/src/client.rs Show resolved Hide resolved
chain/network/src/test_loop.rs Show resolved Hide resolved
core/chain-configs/src/client_config.rs Show resolved Hide resolved
chain/network/src/client.rs Show resolved Hide resolved
chain/client-primitives/src/types.rs Show resolved Hide resolved
chain/client/src/sync/header.rs Show resolved Hide resolved
}

let peer = highest_height_peers
.choose(&mut rand::thread_rng())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, seems like we randomly choose a peer, might want to comment that here in case in the future we have smarter ways of selecting peers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

.choose(&mut rand::thread_rng())
.ok_or_else(|| Error::Other("No peers to request epoch sync from".to_string()))?;

*status = SyncStatus::EpochSync(EpochSyncStatus {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that the peer we are requesting epoch sync from is itself too far behind the current head? Or could a node advertise a false/very far highest_block_height and thus put additional pressure on block sync on current node?
 
During sync, when booting a node, do we have an approx idea about what's the highest/latest height is?

Copy link
Contributor Author

@robin-near robin-near Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot know for sure what the real highest height is (as we would need the epoch sync proof in order to be sure... it's a circular dependency issue). The only thing we can do is try a peer and if it didn't give us a valid proof, ban it. That logic doesn't exist yet; we'll need to implement it but I'm not sure how. Let me add it to the issue list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible idea is select random peer from top 50% of peers with largest highest_block_height or something similar to weed out outdated peers. Out of scope for this PR tho.


// Check that the new node will reach a high height as well.
let new_node = node_datas.last().unwrap().client_sender.actor_handle();
test_loop.set_every_event_callback(move |test_loop_data| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are already checking the new node catches upto old nodes in the run_until below, why do we need this block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for debugging. When things don't work in a huge test, it's very useful to know in the visualizer where the node is, when looking at a specific event.

});
let new_node = node_datas.last().unwrap().client_sender.actor_handle();
let node0 = node_datas[0].client_sender.actor_handle();
test_loop.run_until(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we checking that the sync happened through epoch_sync and not something like block sync? Should we be monitoring the sync status somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah... we should do that somehow... How would that work hmm...

@@ -104,6 +104,9 @@ pub struct TestLoopV2 {
/// Shutdown flag. When this flag is true, delayed action runners will no
/// longer post any new events to the event loop.
shutting_down: Arc<AtomicBool>,
/// If present, a function to call to print something every time an event is
/// handled. Intended only for debugging.
every_event_callback: Option<Box<dyn FnMut(&TestLoopData)>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly a bit wary about adding something like this... 😅 but if it's required, we can keep it.

@robin-near robin-near added this pull request to the merge queue Aug 22, 2024
Merged via the queue into near:master with commit 3a9c72a Aug 22, 2024
28 of 30 checks passed
@robin-near robin-near deleted the epoch4 branch August 22, 2024 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants