-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify archiver to support fast-sync. #2722
Changes from 1 commit
ca639f0
99a7f20
8652c20
6c3fe15
6d2b029
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -79,7 +79,7 @@ use subspace_archiving::archiver::{Archiver, NewArchivedSegment}; | |
use subspace_core_primitives::crypto::kzg::Kzg; | ||
use subspace_core_primitives::objects::BlockObjectMapping; | ||
use subspace_core_primitives::{BlockNumber, RecordedHistorySegment, SegmentHeader, SegmentIndex}; | ||
use tracing::{debug, error, info, warn}; | ||
use tracing::{debug, info, warn}; | ||
|
||
/// Number of WASM instances is 8, this is a bit lower to avoid warnings exceeding number of | ||
/// instances | ||
|
@@ -521,7 +521,6 @@ fn initialize_archiver<Block, Client, AS>( | |
segment_headers_store: &SegmentHeadersStore<AS>, | ||
subspace_link: &SubspaceLink<Block>, | ||
client: &Client, | ||
overridden_last_archived_block: Option<NumberFor<Block>>, | ||
) -> sp_blockchain::Result<InitializedArchiver<Block>> | ||
where | ||
Block: BlockT, | ||
|
@@ -538,12 +537,22 @@ where | |
.chain_constants(best_block_hash)? | ||
.confirmation_depth_k(); | ||
|
||
let best_block_to_archive = | ||
if let Some(overridden_last_archived_block) = overridden_last_archived_block { | ||
overridden_last_archived_block | ||
// Trying to get the "best block to archive" in both cases: regular and fast sync. | ||
let mut best_block_to_archive = best_block_number + 1u32.into(); | ||
for distance in 1..=(confirmation_depth_k + 1) { | ||
let block_number_candidate = best_block_number.saturating_sub(distance.into()); | ||
if client.hash(block_number_candidate).ok().flatten().is_some() | ||
&& client | ||
.hash(block_number_candidate.saturating_sub(1u32.into())) | ||
.ok() | ||
.flatten() | ||
.is_some() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This kind of tricky code should have comments. Even with deep knowledge of the code I do not understand why you need to also check the parent just from reading the code. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a reference to fast-sync. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I am still missing a link between fast sync and parent block. I do not understand why fast sync automatically implies the need to check for parent block. In fact I expect we may not have it once we restart node after single block insertion at the beginning of the segment. There will not be a parent block in that case, but it is fine. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I replaced the loop snippet that confuses both of us back to the previous version with override. I left a TODO if we want to return to this issue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why? This is exactly the kind of technical debt that will live in the code for years because no one will understand why it was done this way and not in more straightforward way. I'd reather understand why something that should have worked didn't. |
||
{ | ||
best_block_to_archive = block_number_candidate; | ||
} else { | ||
best_block_number.saturating_sub(confirmation_depth_k.into()) | ||
}; | ||
break; | ||
} | ||
} | ||
|
||
let maybe_last_archived_block = | ||
find_last_archived_block(client, segment_headers_store, best_block_to_archive)?; | ||
|
@@ -795,7 +804,6 @@ where | |
&segment_headers_store, | ||
&subspace_link, | ||
client.as_ref(), | ||
None, | ||
)?) | ||
} else { | ||
None | ||
|
@@ -806,25 +814,19 @@ where | |
.subscribe(); | ||
|
||
Ok(async move { | ||
let mut saved_block_import_notification = None; | ||
let mut saved_block_import_notification: Option<BlockImportingNotification<Block>> = None; | ||
|
||
let mut block_importing_notification_stream = block_importing_notification_stream; | ||
shamil-gadelshin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
'initialization_recovery: loop { | ||
let overridden_last_archived_block = saved_block_import_notification | ||
.as_ref() | ||
.map(|notification: &BlockImportingNotification<Block>| notification.block_number); | ||
let archived_segment_notification_sender = | ||
subspace_link.archived_segment_notification_sender.clone(); | ||
|
||
let archiver = match maybe_archiver.take() { | ||
Some(archiver) => archiver, | ||
None => initialize_archiver( | ||
&segment_headers_store, | ||
&subspace_link, | ||
client.as_ref(), | ||
overridden_last_archived_block, | ||
)?, | ||
None => { | ||
initialize_archiver(&segment_headers_store, &subspace_link, client.as_ref())? | ||
} | ||
}; | ||
|
||
let InitializedArchiver { | ||
|
@@ -852,14 +854,12 @@ where | |
.await?; | ||
|
||
if !success { | ||
let error = format!( | ||
"Failed to recover the archiver for block number: {}", | ||
block_import_notification.block_number | ||
); | ||
error!(error); | ||
|
||
return Err(sp_blockchain::Error::Consensus(sp_consensus::Error::Other( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My expectation was that archiver would just try again. For that to happen we don't need to save block import notification, we can simply wait for next notification before doing re-initialization. And try that in a loop until it succeeds. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure I follow, if we don't save the import notification - we skip it and fail later. The current loop structure was agreed previously. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This structure was primarily needed for override. Off top of my head I don't see the reason why this is needed and it does make code much harder to follow even if it works correctly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand this thread's question, please rephrase - meanwhile, I'll try to explain how it works: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It will not be if we wait for the next block import as I suggested. Archiver will restart in fully deterministic way and will continue operation just fine. The issue with this line I commented on is that it returns an error, meaning archiver will exit and node will crash with an error. While I think archiver should just restart in a loop until it succeeds. |
||
error.into(), | ||
format!( | ||
"Failed to recover the archiver for block number: {}", | ||
block_import_notification.block_number | ||
) | ||
.into(), | ||
))); | ||
} else { | ||
debug!( | ||
|
@@ -896,7 +896,7 @@ where | |
.await?; | ||
|
||
if !success { | ||
warn!( | ||
debug!( | ||
"Archiver detected an error. \ | ||
shamil-gadelshin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Attempting recovery with block number = {}...", | ||
block_import_notification.block_number | ||
|
@@ -952,17 +952,19 @@ where | |
return Ok(true); | ||
} | ||
|
||
let maybe_block_hash = client.hash(block_number_to_archive)?; | ||
|
||
let Some(block_hash) = maybe_block_hash else { | ||
warn!("Archiver detected an error: older block by number must always exist. "); | ||
|
||
if block_number_to_archive > *best_archived_block_number + 1u32.into() { | ||
shamil-gadelshin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
// There is a gap between the best archived block and the block to archive. | ||
// It's likely a fast sync. | ||
return Ok(false); | ||
}; | ||
} | ||
|
||
let block_hash = client | ||
.hash(block_number_to_archive)? | ||
.expect("Older block by number must always exist."); | ||
|
||
let block = client | ||
.block(block_hash)? | ||
.expect("Older block by number must always exist"); | ||
.expect("Older block by hash must always exist."); | ||
shamil-gadelshin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
let parent_block_hash = *block.block.header().parent_hash(); | ||
let block_hash_to_archive = block.block.hash(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logically this is nonsense, it is not possible to archive block higher than the best block number that exists. And you had to add unnecessary
+1
below just to compensate for this issue that wouldn't exist otherwise.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
best_block_to_archive
is an unchanged variable name that is used withinfind_last_archived_block
method only and it is likely a confusing one. The previous pseudocode we agreed on - didn't work. Feel free to suggest a better solution.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't work how and why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced the loop snippet that confuses both of us back to the previous version with override. I left a TODO if we want to return to this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why? Override is a bad and redundant API. How exactly did it not work without
+1
?