Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lightning-block-sync: Implement serialization logic for header Cache types #3600

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tnull
Copy link
Contributor

@tnull tnull commented Feb 13, 2025

During syncing, lightning-block-sync populates as a block header Cache that is crucial to be able to disconnected previously-connected headers cleanly in case of a reorg. Moreover, the Cache can have performance benefits as subsequently synced listeners might not necessarily need to lookup all headers again from the chain source.

While this Cache is ~crucial to the clean operation of lightning-block-sync, it was previously not possible to persist it to disk due to an absence of serialization logic implementations for the corresponding sub-types. Here, we do just that (Implement said serialization logic) to allow users to persist the Cache.

Making use of the serialization logic for all the sub-types, we also switch the UnboundedCache type to be a newtype wrapper around a HashMap (rather than a straight typedef) and implement TLV-based serialization logic on it.

@tnull tnull requested a review from jkczyz February 13, 2025 12:41
@tnull tnull force-pushed the 2025-02-header-cache-persistence branch from bed21f8 to a403bf9 Compare February 13, 2025 13:44
During syncing, `lightning-block-sync` populates as a block header
`Cache` that is crucial to be able to disconnected previously-connected
headers cleanly in case of a reorg. Moreover, the `Cache` can have
performance benefits as subsequently synced listeners might not
necessarily need to lookup all headers again from the chain source.

While this `Cache` is ~crucial to the clean operation of
`lightning-block-sync`, it was previously not possible to persist it to
disk due to an absence of serialization logic implementations for the
corresponding sub-types. Here, we do just that (Implement said
serialization logic) to allow users to persist the `Cache`.

Making use of the serialization logic for all the sub-types, we also
switch the `UnboundedCache` type to be a newtype wrapper around a
`HashMap` (rather than a straight typedef) and implement TLV-based
serialization logic on it.
@tnull tnull force-pushed the 2025-02-header-cache-persistence branch from a403bf9 to 0207228 Compare February 13, 2025 13:52
@TheBlueMatt
Copy link
Collaborator

I'm not sure I quite understand the desire to serialize this. We use it to keep track of things as we fetch them, but generally shouldn't ever need the cache entries after we connect the blocks (and persist the things that we connected the blocks to).

@tnull
Copy link
Contributor Author

tnull commented Feb 13, 2025

I'm not sure I quite understand the desire to serialize this. We use it to keep track of things as we fetch them, but generally shouldn't ever need the cache entries after we connect the blocks (and persist the things that we connected the blocks to).

Talked with @jkczyz offline about this: AFAIU, we use the cache to keep headers around that might have been connected to another fork. Say we synced a listener up to a certain tip. If it now goes offline, and a reorg happens in the meantime, lightning-block-sync might not have the necessary old header data around to call block_disconnected before re-connecting the blocks of the new best chain.

Additionally, it might bring quite some performance improvements for lightning_block_sync::init::synchronize_listeners which first walks the chain for each listener individually before connecting blocks to them, and only in this last step it's actually inserting into the cache. Meaning: if we sync 6 chain listeners, we're walking the header chain 6 times completely uncached on restart before entering the 'block connection phase'. If we don't persist the Cache, that is.

@jkczyz
Copy link
Contributor

jkczyz commented Feb 13, 2025

Additionally, it might bring quite some performance improvements for lightning_block_sync::init::synchronize_listeners which first walks the chain for each listener individually before connecting blocks to them, and only in this last step it's actually inserting into the cache. Meaning: if we sync 6 chain listeners, we're walking the header chain 6 times completely uncached on restart before entering the 'block connection phase'. If we don't persist the Cache, that is.

Alternatively (or in addition), we could manually update the cache at each iteration during initialization from the ChainDifference, which would mean we'd want some AppendOnlyCache instead of a ReadOnlyCache there. That way we'd only need to query the chain source once for anything not yet cached prior to restarting.

@TheBlueMatt
Copy link
Collaborator

TheBlueMatt commented Feb 13, 2025

If it now goes offline, and a reorg happens in the meantime, lightning-block-sync might not have the necessary old header data around to call block_disconnected before re-connecting the blocks of the new best chain.

This is true if you switch to a new bitcoind, but AFAIU using the same bitcoind (data dir) will always provide us with the missing block (headers).

Additionally, it might bring quite some performance improvements for lightning_block_sync::init::synchronize_listeners which first walks the chain for each listener individually before connecting blocks to them, and only in this last step it's actually inserting into the cache. Meaning: if we sync 6 chain listeners, we're walking the header chain 6 times completely uncached on restart before entering the 'block connection phase'. If we don't persist the Cache, that is.

This seems like a bug we could fix, another way, no?

@tnull
Copy link
Contributor Author

tnull commented Feb 13, 2025

Alternatively (or in addition), we could manually update the cache at each iteration during initialization from the ChainDifference, which would mean we'd want some AppendOnlyCache instead of a ReadOnlyCache there. That way we'd only need to query the chain source once for anything not yet cached prior to restarting.

I guess we could make the cache append only, but that's ~orthogonal to this PR, IMO.

This is true if you switch to a new bitcoind, but AFAIU using the same bitcoind (data dir) will always provide us with the missing block (headers).

Right, so if we want to be able to safely handle switiching chain sources, we need to at least support a minimal persisted cache (in LDK Node we restrict it to 100 entries currently, but could even get away with ~ANTI_REORG_DELAY, IIUC).

This seems like a bug we could fix, another way, no?

I don't think it's a bug per se. We need to walk for each listener separately to make sure we find the connection points reliably. It would be nice to make use of caching during it though.

@TheBlueMatt
Copy link
Collaborator

Right, so if we want to be able to safely handle switiching chain sources, we need to at least support a minimal persisted cache (in LDK Node we restrict it to 100 entries currently, but could even get away with ~ANTI_REORG_DELAY, IIUC).

If we want to support this, IMO its pretty weird to require the cache be persisted by a downstream project, rather if we need it (we should check whether we can just remove the header in the disconnect call and not worry about it, I think we probably can) and then store a few block headers/hashes going back in the listeners themselves, cause they're the things responsible for communicating their current chain location, no?

@tnull
Copy link
Contributor Author

tnull commented Feb 14, 2025

If we want to support this, IMO its pretty weird to require the cache be persisted by a downstream project, rather if we need it (we should check whether we can just remove the header in the disconnect call and not worry about it, I think we probably can) and then store a few block headers/hashes going back in the listeners themselves, cause they're the things responsible for communicating their current chain location, no?

On LDK Node's end it would be nice to maintain the cache, also as a performance optimization. And IMO we can leave it to the user to decide if/how much data they want to persist. In any case, I'm not sure why we wouldn't want to allow for the cache to be persisted?

@TheBlueMatt
Copy link
Collaborator

I'm a bit confused, though, the cache should really only be relevant when we're reorging, which shouldn't be common? Why is it a material performance difference?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants