-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial Snap sync for arbitrum #275
base: master
Are you sure you want to change the base?
Conversation
checkpoint is verified via skeleton
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes when I run the test, it fails:
snap_sync.txt
log.Warn("hPeer not found on HandleLastConfirmed") | ||
return | ||
} | ||
peer.RequestCheckpoint(nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check the response here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function only sends the checkpoint request message.
The peer is expected to respond with a checkpoint messge, and that will be handled by HandleCheckpoint function.
We might/should eventually add things like retry this request / remove peer if no response.. but that should be part of a later PR.
arbitrum/handler_p2p.go
Outdated
defer hPeer.mutex.Unlock() | ||
hPeer.arb = nil | ||
if hPeer.eth != nil { | ||
hPeer.eth.Disconnect(p2p.DiscSelf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why we pass p2p.DiscSelf
here instead of eg. p2p.DiscUselessPeer
?
like here:
Lines 475 to 481 in 021ff6f
// removePeer requests disconnection of a peer. | |
func (h *handler) removePeer(id string) { | |
peer := h.peers.peer(id) | |
if peer != nil { | |
peer.Peer.Disconnect(p2p.DiscUselessPeer) | |
} | |
} |
if canonical == (common.Hash{}) { | ||
skeleton := rawdb.ReadSkeletonHeader(h.db, number) | ||
if skeleton == nil { | ||
log.Error("arbitrum handler_p2p: canonical not found", "number", number, "peer", peer.ID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably should panic here with some descriptive message, otherwise the panic will happen in line 333 canonical = skeleton.Hash()
- if we hit this error then either there is a bug in waitBlockSync
or the database is broken.
or we can put line 333 into else
block and fail somewhere later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch!
we don't need to panic, just return (effectively: ignore the received checkpoint message)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also drop the peer then?
I am still not sure about how we should handle this corner case, because if we can't read skeleton header from the database here, then either the database is corrupted or waitBlockSync
didn't work.
In waitBlockSync
(called few lines before) we are waiting until filler.Resume
sets h.syncedBlockNum
to number greater or equal then the checkpoint block number. filler.Resume
reads the number from header fromh.downloader.SkeletonHead()
and it internally reads the header from the database. So it seems impossible to first get the header from db and for the second time not, unless something is broken.
Does it make sense?
And if so, should we log a critical error here (or panic)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how we could fail this test. I don't have a good scenario that doesn't require some bug somewhere.
Say we do have a race bug in waitForBlockSync - in that case, a return is the right approach. Maybe if we get another checkpoint from this or another peer it will work.
Generally if you get an unexpected error you should try to contain it. Just make sure you don't cause illegal values elsewhere. Panic would be the opposite of containing the error.
No reason to assume anything wrong with this peer so a drop is unnecessary either. (Later we should have logic trying to handle peers DOSing etc)
arbitrum/handler_p2p.go
Outdated
} | ||
nodeIter, err := storageTrie.NodeIterator(origin[:]) | ||
if err != nil { | ||
log.Error("Failed node iterator to open storage trie", "root", acc.Root, "err", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems that the error message needs fixing
No description provided.