-
Notifications
You must be signed in to change notification settings - Fork 627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Epoch Sync] Initial implementation for Epoch Sync V4. #11934
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #11934 +/- ##
==========================================
+ Coverage 71.49% 71.52% +0.03%
==========================================
Files 810 812 +2
Lines 164037 164452 +415
Branches 164037 164452 +415
==========================================
+ Hits 117283 117631 +348
+ Misses 41671 41670 -1
- Partials 5083 5151 +68
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
More tests could be added as a follow up task.
} | ||
|
||
let peer = highest_height_peers | ||
.choose(&mut rand::thread_rng()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, seems like we randomly choose a peer, might want to comment that here in case in the future we have smarter ways of selecting peers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
.choose(&mut rand::thread_rng()) | ||
.ok_or_else(|| Error::Other("No peers to request epoch sync from".to_string()))?; | ||
|
||
*status = SyncStatus::EpochSync(EpochSyncStatus { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that the peer we are requesting epoch sync from is itself too far behind the current head? Or could a node advertise a false/very far highest_block_height
and thus put additional pressure on block sync on current node?
During sync, when booting a node, do we have an approx idea about what's the highest/latest height is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot know for sure what the real highest height is (as we would need the epoch sync proof in order to be sure... it's a circular dependency issue). The only thing we can do is try a peer and if it didn't give us a valid proof, ban it. That logic doesn't exist yet; we'll need to implement it but I'm not sure how. Let me add it to the issue list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possible idea is select random peer from top 50% of peers with largest highest_block_height
or something similar to weed out outdated peers. Out of scope for this PR tho.
|
||
// Check that the new node will reach a high height as well. | ||
let new_node = node_datas.last().unwrap().client_sender.actor_handle(); | ||
test_loop.set_every_event_callback(move |test_loop_data| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are already checking the new node catches upto old nodes in the run_until
below, why do we need this block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's for debugging. When things don't work in a huge test, it's very useful to know in the visualizer where the node is, when looking at a specific event.
}); | ||
let new_node = node_datas.last().unwrap().client_sender.actor_handle(); | ||
let node0 = node_datas[0].client_sender.actor_handle(); | ||
test_loop.run_until( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we checking that the sync happened through epoch_sync and not something like block sync? Should we be monitoring the sync status somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah... we should do that somehow... How would that work hmm...
@@ -104,6 +104,9 @@ pub struct TestLoopV2 { | |||
/// Shutdown flag. When this flag is true, delayed action runners will no | |||
/// longer post any new events to the event loop. | |||
shutting_down: Arc<AtomicBool>, | |||
/// If present, a function to call to print something every time an event is | |||
/// handled. Intended only for debugging. | |||
every_event_callback: Option<Box<dyn FnMut(&TestLoopData)>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly a bit wary about adding something like this... 😅 but if it's required, we can keep it.
Remaining issues:
near/near-one-project-tracking#73
I still need to go through this one more time, but reviews can be done in the meantime.