Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify archiver to support fast-sync. #2722

Closed
wants to merge 5 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 35 additions & 33 deletions crates/sc-consensus-subspace/src/archiver.rs
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ use subspace_archiving::archiver::{Archiver, NewArchivedSegment};
use subspace_core_primitives::crypto::kzg::Kzg;
use subspace_core_primitives::objects::BlockObjectMapping;
use subspace_core_primitives::{BlockNumber, RecordedHistorySegment, SegmentHeader, SegmentIndex};
use tracing::{debug, error, info, warn};
use tracing::{debug, info, warn};

/// Number of WASM instances is 8, this is a bit lower to avoid warnings exceeding number of
/// instances
Expand Down Expand Up @@ -521,7 +521,6 @@ fn initialize_archiver<Block, Client, AS>(
segment_headers_store: &SegmentHeadersStore<AS>,
subspace_link: &SubspaceLink<Block>,
client: &Client,
overridden_last_archived_block: Option<NumberFor<Block>>,
) -> sp_blockchain::Result<InitializedArchiver<Block>>
where
Block: BlockT,
Expand All @@ -538,12 +537,22 @@ where
.chain_constants(best_block_hash)?
.confirmation_depth_k();

let best_block_to_archive =
if let Some(overridden_last_archived_block) = overridden_last_archived_block {
overridden_last_archived_block
// Trying to get the "best block to archive" in both cases: regular and fast sync.
let mut best_block_to_archive = best_block_number + 1u32.into();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically this is nonsense, it is not possible to archive block higher than the best block number that exists. And you had to add unnecessary +1 below just to compensate for this issue that wouldn't exist otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

best_block_to_archive is an unchanged variable name that is used within find_last_archived_block method only and it is likely a confusing one. The previous pseudocode we agreed on - didn't work. Feel free to suggest a better solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't work how and why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced the loop snippet that confuses both of us back to the previous version with override. I left a TODO if we want to return to this issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why? Override is a bad and redundant API. How exactly did it not work without +1?

for distance in 1..=(confirmation_depth_k + 1) {
let block_number_candidate = best_block_number.saturating_sub(distance.into());
if client.hash(block_number_candidate).ok().flatten().is_some()
&& client
.hash(block_number_candidate.saturating_sub(1u32.into()))
.ok()
.flatten()
.is_some()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of tricky code should have comments. Even with deep knowledge of the code I do not understand why you need to also check the parent just from reading the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a reference to fast-sync.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// We might add the block on the fast sync and we need to check for the parent block
// availability as well to continue iteration.

I am still missing a link between fast sync and parent block. I do not understand why fast sync automatically implies the need to check for parent block. In fact I expect we may not have it once we restart node after single block insertion at the beginning of the segment. There will not be a parent block in that case, but it is fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced the loop snippet that confuses both of us back to the previous version with override. I left a TODO if we want to return to this issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? This is exactly the kind of technical debt that will live in the code for years because no one will understand why it was done this way and not in more straightforward way. I'd reather understand why something that should have worked didn't.

{
best_block_to_archive = block_number_candidate;
} else {
best_block_number.saturating_sub(confirmation_depth_k.into())
};
break;
}
}

let maybe_last_archived_block =
find_last_archived_block(client, segment_headers_store, best_block_to_archive)?;
Expand Down Expand Up @@ -795,7 +804,6 @@ where
&segment_headers_store,
&subspace_link,
client.as_ref(),
None,
)?)
} else {
None
Expand All @@ -806,25 +814,19 @@ where
.subscribe();

Ok(async move {
let mut saved_block_import_notification = None;
let mut saved_block_import_notification: Option<BlockImportingNotification<Block>> = None;

let mut block_importing_notification_stream = block_importing_notification_stream;
shamil-gadelshin marked this conversation as resolved.
Show resolved Hide resolved

'initialization_recovery: loop {
let overridden_last_archived_block = saved_block_import_notification
.as_ref()
.map(|notification: &BlockImportingNotification<Block>| notification.block_number);
let archived_segment_notification_sender =
subspace_link.archived_segment_notification_sender.clone();

let archiver = match maybe_archiver.take() {
Some(archiver) => archiver,
None => initialize_archiver(
&segment_headers_store,
&subspace_link,
client.as_ref(),
overridden_last_archived_block,
)?,
None => {
initialize_archiver(&segment_headers_store, &subspace_link, client.as_ref())?
}
};

let InitializedArchiver {
Expand Down Expand Up @@ -852,14 +854,12 @@ where
.await?;

if !success {
let error = format!(
"Failed to recover the archiver for block number: {}",
block_import_notification.block_number
);
error!(error);

return Err(sp_blockchain::Error::Consensus(sp_consensus::Error::Other(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My expectation was that archiver would just try again. For that to happen we don't need to save block import notification, we can simply wait for next notification before doing re-initialization. And try that in a loop until it succeeds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow, if we don't save the import notification - we skip it and fail later. The current loop structure was agreed previously.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This structure was primarily needed for override. Off top of my head I don't see the reason why this is needed and it does make code much harder to follow even if it works correctly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this thread's question, please rephrase - meanwhile, I'll try to explain how it works:
When the archiver detects a gap between blocks it returns the control to the calling function which in turn saves the current block notification. After that, it reinitializes archiver (via initialization loop) and uses the saved block notification to archive a block again. After that, it resumes the normal process of the block notification processing. If we don't save the problematic block notification then it would be lost and the archiving process would be broken.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't save the problematic block notification then it would be lost and the archiving process would be broken.

It will not be if we wait for the next block import as I suggested. Archiver will restart in fully deterministic way and will continue operation just fine.

The issue with this line I commented on is that it returns an error, meaning archiver will exit and node will crash with an error. While I think archiver should just restart in a loop until it succeeds.

error.into(),
format!(
"Failed to recover the archiver for block number: {}",
block_import_notification.block_number
)
.into(),
)));
} else {
debug!(
Expand Down Expand Up @@ -896,7 +896,7 @@ where
.await?;

if !success {
warn!(
debug!(
"Archiver detected an error. \
shamil-gadelshin marked this conversation as resolved.
Show resolved Hide resolved
Attempting recovery with block number = {}...",
block_import_notification.block_number
Expand Down Expand Up @@ -952,17 +952,19 @@ where
return Ok(true);
}

let maybe_block_hash = client.hash(block_number_to_archive)?;

let Some(block_hash) = maybe_block_hash else {
warn!("Archiver detected an error: older block by number must always exist. ");

if block_number_to_archive > *best_archived_block_number + 1u32.into() {
shamil-gadelshin marked this conversation as resolved.
Show resolved Hide resolved
// There is a gap between the best archived block and the block to archive.
// It's likely a fast sync.
return Ok(false);
};
}

let block_hash = client
.hash(block_number_to_archive)?
.expect("Older block by number must always exist.");

let block = client
.block(block_hash)?
.expect("Older block by number must always exist");
.expect("Older block by hash must always exist.");
shamil-gadelshin marked this conversation as resolved.
Show resolved Hide resolved

let parent_block_hash = *block.block.header().parent_hash();
let block_hash_to_archive = block.block.hash();
Expand Down
Loading