Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Snap sync for arbitrum #275

Open
wants to merge 35 commits into
base: master
Choose a base branch
from
Open

Initial Snap sync for arbitrum #275

wants to merge 35 commits into from

Conversation

tsahee
Copy link
Contributor

@tsahee tsahee commented Nov 28, 2023

No description provided.

@cla-bot cla-bot bot added the s CLA signed label Nov 28, 2023
Copy link
Member

@Tristan-Wilson Tristan-Wilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes when I run the test, it fails:
snap_sync.txt

log.Warn("hPeer not found on HandleLastConfirmed")
return
}
peer.RequestCheckpoint(nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check the response here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function only sends the checkpoint request message.
The peer is expected to respond with a checkpoint messge, and that will be handled by HandleCheckpoint function.
We might/should eventually add things like retry this request / remove peer if no response.. but that should be part of a later PR.

defer hPeer.mutex.Unlock()
hPeer.arb = nil
if hPeer.eth != nil {
hPeer.eth.Disconnect(p2p.DiscSelf)
Copy link
Contributor

@magicxyyz magicxyyz May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we pass p2p.DiscSelf here instead of eg. p2p.DiscUselessPeer?

like here:

go-ethereum/eth/handler.go

Lines 475 to 481 in 021ff6f

// removePeer requests disconnection of a peer.
func (h *handler) removePeer(id string) {
peer := h.peers.peer(id)
if peer != nil {
peer.Peer.Disconnect(p2p.DiscUselessPeer)
}
}

if canonical == (common.Hash{}) {
skeleton := rawdb.ReadSkeletonHeader(h.db, number)
if skeleton == nil {
log.Error("arbitrum handler_p2p: canonical not found", "number", number, "peer", peer.ID())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably should panic here with some descriptive message, otherwise the panic will happen in line 333 canonical = skeleton.Hash() - if we hit this error then either there is a bug in waitBlockSync or the database is broken.

or we can put line 333 into else block and fail somewhere later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch!
we don't need to panic, just return (effectively: ignore the received checkpoint message)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also drop the peer then?

I am still not sure about how we should handle this corner case, because if we can't read skeleton header from the database here, then either the database is corrupted or waitBlockSync didn't work.
In waitBlockSync (called few lines before) we are waiting until filler.Resume sets h.syncedBlockNum to number greater or equal then the checkpoint block number. filler.Resume reads the number from header fromh.downloader.SkeletonHead() and it internally reads the header from the database. So it seems impossible to first get the header from db and for the second time not, unless something is broken.

Does it make sense?
And if so, should we log a critical error here (or panic)?

Copy link
Contributor Author

@tsahee tsahee Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how we could fail this test. I don't have a good scenario that doesn't require some bug somewhere.
Say we do have a race bug in waitForBlockSync - in that case, a return is the right approach. Maybe if we get another checkpoint from this or another peer it will work.

Generally if you get an unexpected error you should try to contain it. Just make sure you don't cause illegal values elsewhere. Panic would be the opposite of containing the error.

No reason to assume anything wrong with this peer so a drop is unnecessary either. (Later we should have logic trying to handle peers DOSing etc)

}
nodeIter, err := storageTrie.NodeIterator(origin[:])
if err != nil {
log.Error("Failed node iterator to open storage trie", "root", acc.Root, "err", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems that the error message needs fixing

@tsahee tsahee marked this pull request as ready for review June 6, 2024 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
s CLA signed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants