Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop relying on ChannelMonitor persistence after manager read #3322

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

TheBlueMatt
Copy link
Collaborator

When we discover we've only partially claimed an MPP HTLC during
ChannelManager reading, we need to add the payment preimage to
all other ChannelMonitors that were a part of the payment.

We previously did this with a direct call on the ChannelMonitor,
requiring users write the full ChannelMonitor to disk to ensure
that updated information made it.

This adds quite a bit of delay during initial startup - fully
resilvering each ChannelMonitor just to handle this one case is
incredibly excessive.

Instead, we rewrite the MPP claim replay logic to use only (new) data included in ChannelMonitors, which has a nice side-effect of teeing up future ChannelManager-non-persistence features as well as makes our PaymentClaimed event generation much more robust.

When we started tracking which channels had MPP parts claimed
durably on-disk in their `ChannelMonitor`, we did so with a tuple.
This was fine in that it was only ever accessed in two places, but
as we will start tracking it through to the `ChannelMonitor`s
themselves in the coming commit(s), it is useful to have it in a
struct instead.
When we claim an MPP payment, then crash before persisting all the
relevant `ChannelMonitor`s, we rely on the payment data being
available in the `ChannelManager` on restart to re-claim any parts
that haven't yet been claimed. This is fine as long as the
`ChannelManager` was persisted before the `PaymentClaimable` event
was processed, which is generally the case in our
`lightning-background-processor`, but may not be in other cases or
in a somewhat rare race.

In order to fix this, we need to track where all the MPP parts of
a payment are in the `ChannelMonitor`, allowing us to re-claim any
missing pieces without reference to any `ChannelManager` data.

Further, in order to properly generate a `PaymentClaimed` event
against the re-started claim, we have to store various payment
metadata with the HTLC list as well.

Here we take the first step, building a list of MPP parts and
metadata in `ChannelManager` and passing it through to
`ChannelMonitor` in the `ChannelMonitorUpdate`s.
When we claim an MPP payment, then crash before persisting all the
relevant `ChannelMonitor`s, we rely on the payment data being
available in the `ChannelManager` on restart to re-claim any parts
that haven't yet been claimed. This is fine as long as the
`ChannelManager` was persisted before the `PaymentClaimable` event
was processed, which is generally the case in our
`lightning-background-processor`, but may not be in other cases or
in a somewhat rare race.

In order to fix this, we need to track where all the MPP parts of
a payment are in the `ChannelMonitor`, allowing us to re-claim any
missing pieces without reference to any `ChannelManager` data.

Further, in order to properly generate a `PaymentClaimed` event
against the re-started claim, we have to store various payment
metadata with the HTLC list as well.

Here we store the required MPP parts and metadata in
`ChannelMonitor`s and make them available to `ChannelManager` on
load.
In a coming commit we'll use the existing `ChannelManager` claim
flow to claim HTLCs which we found partially claimed on startup,
necessitating having a full `ChannelManager` when we go to do so.

Here we move the re-claim logic down in the `ChannelManager`-read
logic so that we have that.
Here we wrap the logic which moves claimable payments from
`claimable_payments` to `pending_claiming_payments` to a new
utility function on `ClaimablePayments`. This will allow us to call
this new logic during `ChannelManager` deserialization in a few
commits.
In the next commit we'll start using (much of) the normal HTLC
claim pipeline to replay payment claims on startup. In order to do
so, however, we have to properly handle cases where we get a
`DuplicateClaim` back from the channel for an inbound-payment HTLC.

Here we do so, handling the `MonitorUpdateCompletionAction` and
allowing an already-completed RAA blocker.
When we claim an MPP payment, then crash before persisting all the
relevant `ChannelMonitor`s, we rely on the payment data being
available in the `ChannelManager` on restart to re-claim any parts
that haven't yet been claimed. This is fine as long as the
`ChannelManager` was persisted before the `PaymentClaimable` event
was processed, which is generally the case in our
`lightning-background-processor`, but may not be in other cases or
in a somewhat rare race.

In order to fix this, we need to track where all the MPP parts of
a payment are in the `ChannelMonitor`, allowing us to re-claim any
missing pieces without reference to any `ChannelManager` data.

Further, in order to properly generate a `PaymentClaimed` event
against the re-started claim, we have to store various payment
metadata with the HTLC list as well.

Here we finally implement claiming using the new MPP part list and
metadata stored in `ChannelMonitor`s. In doing so, we use much more
of the existing HTLC-claiming pipeline in `ChannelManager`,
utilizing the on-startup background events flow as well as properly
re-applying the RAA-blockers to ensure preimages cannot be lost.
When we discover we've only partially claimed an MPP HTLC during
`ChannelManager` reading, we need to add the payment preimage to
all other `ChannelMonitor`s that were a part of the payment.

We previously did this with a direct call on the `ChannelMonitor`,
requiring users write the full `ChannelMonitor` to disk to ensure
that updated information made it.

This adds quite a bit of delay during initial startup - fully
resilvering each `ChannelMonitor` just to handle this one case is
incredibly excessive.

Over the past few commits we dropped the need to pass HTLCs
directly to the `ChannelMonitor`s using the background events to
provide `ChannelMonitorUpdate`s insetad.

Thus, here we finally drop the requirement to resilver
`ChannelMonitor`s on startup.
Because the new startup `ChannelMonitor` persistence semantics rely
on new information stored in `ChannelMonitor` only for claims made
in the upgraded code, users upgrading from previous version of LDK
must apply the old `ChannelMonitor` persistence semantics at least
once (as the old code will be used to handle partial claims).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant