Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify archiver to support fast-sync. #2722
Modify archiver to support fast-sync. #2722
Changes from 1 commit
ca639f0
99a7f20
8652c20
6c3fe15
6d2b029
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we need to log an error here if we exit with it anyway. The whole node will crash and I believe we will log the error at higher level anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My expectation was that archiver would just try again. For that to happen we don't need to save block import notification, we can simply wait for next notification before doing re-initialization. And try that in a loop until it succeeds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow, if we don't save the import notification - we skip it and fail later. The current loop structure was agreed previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This structure was primarily needed for override. Off top of my head I don't see the reason why this is needed and it does make code much harder to follow even if it works correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this thread's question, please rephrase - meanwhile, I'll try to explain how it works:
When the archiver detects a gap between blocks it returns the control to the calling function which in turn saves the current block notification. After that, it reinitializes archiver (via initialization loop) and uses the saved block notification to archive a block again. After that, it resumes the normal process of the block notification processing. If we don't save the problematic block notification then it would be lost and the archiving process would be broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will not be if we wait for the next block import as I suggested. Archiver will restart in fully deterministic way and will continue operation just fine.
The issue with this line I commented on is that it returns an error, meaning archiver will exit and node will crash with an error. While I think archiver should just restart in a loop until it succeeds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is the correct or maybe exhausive condition. The fact that we don't have a block number to archive doesn't mean we can restart archiver either.
What I think this should check is whether the gap between last archived block and current block to archive is not 1. If it isn't, archiver state will be inconsistent even if block to archive exists (which is theoretically possible). From there we need to retry archiver initialization until it succeeds (because again it is not guaranteed to in case block was imported in such a way that archiver can't be initialized).
The logic here is very fragile and has implicit assumptions that are not obvious, the main of which is that the block import notitification will be about the first block in the segment header or else archiver will either fail to initialize or initialize in the wrong state (not sure which one and lazy to analyze all the code paths right now).
I actually don't think this will work for fast sync from what I recall because in fast sync the first block we import manually bypassing all the checks is the block at which archiver should be initialized and block import notification will be fired with the block that follows. So you have to let archiver pick last archived block and re-initialize itself properly instead of overriding last archived block like done in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree here - this would be an improvement. I'll change it when we agree on other points.
I don't think I understand here. Why do you think we need to try to reinitialize the archiver until it succeeds when it fails after the first reinitialization? The current loop allows failing exactly once for each block import attempt because subsequent initialization won't change anything and it's better to fail fast here.
This confuses me a lot because it's pretty much my own argument when we discussed this approach in contrast to an explicit reinitialization of the previous version.
I tested the PR by applying all the rest of the fast-sync solution and it works as expected. The overriding emerges when we need to deal with the
confirmation_depth_k
subtraction - I don't see a better way to work around this operation and am happy to implement it differently.Overall after the last two refactoring PRs the current solution is very close to what we discussed previously (at least from my perspective) as an alternative to the event-based explicit initialization. The deviation from that (best block to archive override and saved block import notification) emerged with the practical implementation of the original sketch. Please, let me know what you think is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have mentioned that each new attempt should be made after new important blocks. Does it make more sense with this context?
Ideally we would have neither explicit reinitialization nor the issues mentioned and I do believe it is possible.
I suspect it worked as expected until it didn't. It would have failed on the next segment header that would happen at a different point/with a different state. Can be verified by modifying your fast sync text to import block from pre-last segment instead so you can check if the next segment is processed correctly. I bet it will not succeed.
I agree, just trying to analyze the code path and see if there are issues/improvements with what I see.
It would be great to have tests here to check such cases, but there are quite a few bounds in this function that makes it difficult.