Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug]: Make Registering HTLC Attempt and SendPayment atomic #9340

Open
ziggie1984 opened this issue Dec 9, 2024 · 1 comment
Open

[bug]: Make Registering HTLC Attempt and SendPayment atomic #9340

ziggie1984 opened this issue Dec 9, 2024 · 1 comment
Labels
bug Unintended code behaviour payments Related to invoices/payments

Comments

@ziggie1984
Copy link
Collaborator

During MPP tests and blinded paths I came across another edge case which would lead to stuck payment which would always relaunch on every startup and never resolve itself.

When registering a payment attempt we need to make sure we register a payment attempt only if we also manifested this new state on the corresponding channel. So in other words we need to make the SendPayment attempt and the registering of the attempt atomic.

Otherwise the payment might never make it to the peer and we never relaunch this payment attempt however the payment is still marked as INFLIGHT with the control tower.

Logfile output for this particular case:

The payment is launched:

2024-12-07 23:06:16.375 [DBG] CRTR: Attempt 1208 for payment 24426b8028f15c8310c608d957b18148ec4ef4c2d8b360f998667d3d72330be3 successfully sent to switch, route: 1091815046447104 (1002204 mSAT) -> 0 (0 mSAT) -> 0 (0 mSAT), cltv 1355
2024-12-07 23:06:16.375 [DBG] CRTR: Collecting result for attempt 1208 in payment 24426b8028f15c8310c608d957b18148ec4ef4c2d8b360f998667d3d72330be3
2024-12-07 23:06:16.375 [DBG] HSWC: ChannelLink(5601e8ab0a9ab32fde02b7e73571da64aad3848b8ca59b21cea327e15fe44cf7:0): queueing keystone of ADD open circuit: (Chan ID=0:0:0, HTLC ID=1208)->(Chan ID=993:1:0, HTLC ID=214)
2024-12-07 23:06:16.375 [DBG] PEER: Peer(034d11a4129c8c73856c41d0c5afb5108a635ba852ead2ca2a253b01534638bd2c): Sending UpdateAddHTLC(chan_id=f74ce45fe127a3ce219ba58c8b84d3aa64da7135e7b702de2fb39a0aabe80156, id=214, amt=1002204 mSAT, expiry=1355, hash=24426b8028f15c8310c608d957b18148ec4ef4c2d8b360f998667d3d72330be3, blinding_point=, custom_records=map[106823:[0]]) to 034d11a4129c8c73856c41d0c5afb5108a635ba852ead2ca2a253b01534638bd2c@127.0.0.1:9739

Then I abort the LND daemon:

2024-12-07 23:06:16.397 [ERR] CRTR: Error collecting result for attempt 1205 in payment 24426b8028f15c8310c608d957b18148ec4ef4c2d8b360f998667d3d72330be3: htlcswitch shutting down
2024-12-07 23:06:16.397 [DBG] CRTR: Result collected for attempt 1205 in payment 24426b8028f15c8310c608d957b18148ec4ef4c2d8b360f998667d3d72330be3
2024-12-07 23:06:16.397 [ERR] CRTR: Error collecting result for attempt 1207 in payment 24426b8028f15c8310c608d957b18148ec4ef4c2d8b360f998667d3d72330be3: htlcswitch shutting down
2024-12-07 23:06:16.397 [DBG] CRTR: Result collected for attempt 1207 in payment 24426b8028f15c8310c608d957b18148ec4ef4c2d8b360f998667d3d72330be3
2024-12-07 23:06:16.397 [ERR] CRTR: Error collecting result for attempt 1204 in payment 24426b8028f15c8310c608d957b18148ec4ef4c2d8b360f998667d3d72330be3: htlcswitch shutting down
2024-12-07 23:06:16.397 [DBG] CRTR: Result collected for attempt 1204 in payment 24426b8028f15c8310c608d957b18148ec4ef4c2d8b360f998667d3d72330be3
2024-12-07 23:06:16.397 [DBG] PEER: Peer(034d11a4129c8c73856c41d0c5afb5108a635ba852ead2ca2a253b01534638bd2c): Received UpdateFailHTLC(chan_id=f74ce45fe127a3ce219ba58c8b84d3aa64da7135e7b702de2fb39a0aabe80156, id=212, reason=f8fdbbc86efdee4b79be17489b9b6f409c56eeb2d59b4945d18fef33cff416659cfb7aa4c37adf7788fc45763a78a4600243e097ba277c38dde8c54d70ffbef79ce1d753a00db0c4ec487990f8b1498e0d32d85077eee6b9daa7224f0a68527d6b2635c3add1e99e79164df216f52f0b642c33391b5ad8684e29bbef3c797ce612a88eea8984610fe15f802b6b3d0f9b38a8cac5f003d612ce316bb4d63db040b3388b1acc18586308ae7d8d114a435196759252eab2811ccca0ed1a35f0be558be5d65f0e9bcb722ab438f289525df4653e5cd448038c5b912c95552eb48820ca7a0b1ccd549e1c60bdd5832462f1d53bdea152d7315b0a47016f9353603cb250093940053c1076ddc4c87c3ca2f8a9c119e3c9fd6478e18a23e07b6ba15f4648439c1d) from 034d11a4129c8c73856c41d0c5afb5108a635ba852ead2ca2a253b01534638bd2c@127.0.0.1:9739
2024-12-07 23:06:16.402 [ERR] HSWC: ChannelLink(5601e8ab0a9ab32fde02b7e73571da64aad3848b8ca59b21cea327e15fe44cf7:0): failing link: unable to update commitment: received quit signal with error: internal error
2024-12-07 23:06:16.402 [ERR] HSWC: ChannelLink(5601e8ab0a9ab32fde02b7e73571da64aad3848b8ca59b21cea327e15fe44cf7:0): link failed, exiting htlcManager
2024-12-07 23:06:16.402 [INF] HSWC: ChannelLink(5601e8ab0a9ab32fde02b7e73571da64aad3848b8ca59b21cea327e15fe44cf7:0): exited
2024-12-07 23:06:16.402 [ERR] PEER: Peer(034d11a4129c8c73856c41d0c5afb5108a635ba852ead2ca2a253b01534638bd2c): unable to send msg to remote peer: peer exiting
2024-12-07 23:06:16.402 [DBG] PEER: Peer(034d11a4129c8c73856c41d0c5afb5108a635ba852ead2ca2a253b01534638bd2c): Sending Error(chan_id=f74ce45fe127a3ce219ba58c8b84d3aa64da7135e7b702de2fb39a0aabe80156, err=internal error) to 034d11a4129c8c73856c41d0c5afb5108a635ba852ead2ca2a253b01534638bd2c@127.0.0.1:9739

as seen above one msg fails to be flushed to the wire, thats very likely the msg from the above Sending UpdateAddHTLC msg.

Now this payment is marked as INFLIGHT (because the htlc was never marked as failed or settled) which has the side effect of the payment always relaunching its resolver during startup.

2024-12-07 23:29:28.367 [INF] CRTR: Resuming payment 24426b8028f15c8310c608d957b18148ec4ef4c2d8b360f998667d3d72330be3
2024-12-07 23:29:28.367 [INF] DISC: Authenticated Gossiper starting
2024-12-07 23:29:28.367 [INF] NTFN: New block epoch subscription
2024-12-07 23:29:28.370 [INF] INVC: InvoiceRegistry starting...
2024-12-07 23:29:28.370 [DBG] DISC: Requesting online notification for peer=0327cb365db2601f42eca18744da82e7a79a2d7af0fba5d8b80f7f461d26745376
2024-12-07 23:29:28.370 [DBG] DISC: Requesting online notification for peer=034d11a4129c8c73856c41d0c5afb5108a635ba852ead2ca2a253b01534638bd2c
2024-12-07 23:29:28.370 [INF] NTFN: New block epoch subscription
2024-12-07 23:29:28.370 [DBG] INVC: Adding 0 pending invoices to the expiry watcher
2024-12-07 23:29:28.370 [DBG] INVC: InvoiceRegistry started
2024-12-07 23:29:28.370 [INF] HSWC: Onion processor starting
2024-12-07 23:29:28.370 [DBG] DISC: New block: height=1128, hash=6247c67ce49f0f0423366772d16dd53c76a4752b6e7a29bb293e8b4f6974baff
2024-12-07 23:29:28.376 [INF] NTFN: New block epoch subscription
2024-12-07 23:29:28.376 [INF] NANN: Channel Status Manager starting
2024-12-07 23:29:28.377 [INF] CHFT: ChannelEventStore starting...
2024-12-07 23:29:28.378 [INF] CHFT: Adding 2 channels to event store
2024-12-07 23:29:28.378 [DBG] CHFT: ChannelEventStore started
2024-12-07 23:29:28.378 [INF] CHBU: chanbackup.SubSwapper starting
....
2024-12-07 23:29:28.382 [INF] CRTR: Resuming HTLC attempt 1208 for payment 24426b8028f15c8310c608d957b18148ec4ef4c2d8b360f998667d3d72330be3
@ziggie1984 ziggie1984 added bug Unintended code behaviour payments Related to invoices/payments labels Dec 9, 2024
@ziggie1984
Copy link
Collaborator Author

ziggie1984 commented Dec 9, 2024

Additionally it would be very beneficial to have a rationale why this peer msg is not called in sync:

_ = l.cfg.Peer.SendMessage(false, htlc)

Moreover I think changing the log-output here and only printing the sending if the msg was actually flushed to the wire would make things more clearer.

So I propose switching the order here:

lnd/peer/brontide.go

Lines 2449 to 2455 in fb429d6

if msg != nil {
p.logWireMessage(msg, false)
}
noiseConn := p.cfg.Conn
flushMsg := func() error {

@ziggie1984 ziggie1984 changed the title [bug]: Edge Case Stuck Payment [bug]: Make Registering HTLC Attempt and SendPayment atomic Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unintended code behaviour payments Related to invoices/payments
Projects
None yet
Development

No branches or pull requests

1 participant