Skip to content

Workaround for LND to cause a force-close on our channel #8213

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

adi2011
Copy link
Collaborator

@adi2011 adi2011 commented Apr 5, 2025

LND does not force close on us when we recover our node using emergency.recover or peer storage backups (https://github.com/lightningnetwork/lnd/blob/abb1e3463f3a83bbb843d5c399869dbe930ad94f/htlcswitch/link.go#L2119).

We will send a channel reestablish with bogus data to make them force close on us.

@vincenzopalazzo
Copy link
Collaborator

What will be the right behaviour in this case? wondering if we may report this to LND while keeping the workaround

@adi2011
Copy link
Collaborator Author

adi2011 commented Apr 7, 2025

The correct behaviour should be LND responding with a unilateral close after receiving the error. This is a known behaviour, and LDK handles it in a similar manner.

@vincenzopalazzo
Copy link
Collaborator

vincenzopalazzo commented Apr 7, 2025

The correct behaviour should be LND responding with a unilateral close after receiving the error.

Are we reporting to LND this wrong behaviour?

Copy link
Collaborator

@endothermicdev endothermicdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was an interesting read. Old LND gossip bug leads to old CLN version to be too aggressive in sending an error, so LND treats all errors as soft rather than force-closing the channel, which requires current CLN to botch the channel reestablish to explicitly initiate a force-close. I guess it makes sense, but some of these old bugs were resolved 5 years ago and yet are still reverberating between implementations. It's probably time to revisit this in spec discussion.

In the mean time this is a pragmatic solution.

ACK 92fc59b

@adi2011
Copy link
Collaborator Author

adi2011 commented Apr 9, 2025

@vincenzopalazzo It's a known issue across implementations — I assume someone must've reported it already. I'll check and file an issue if I don't find one in their repository.

@whitslack
Copy link
Collaborator

Old LND gossip bug leads to old CLN version to be too aggressive in sending an error, so LND treats all errors as soft rather than force-closing the channel, which requires current CLN to botch the channel reestablish to explicitly initiate a force-close.

I recently encountered a corollary to this. LND sends an error but then also broadcasts its own commitment to force-close the channel, as though it expects the error it sent not to trigger its peer to force-close the channel.

@ShahanaFarooqui ShahanaFarooqui added this to the v25.05 milestone Apr 14, 2025
Unfortunately LND does not force close the channels on receiving an error,
they blame us for this behaviour (https://github.com/lightningnetwork/lnd/blob/abb1e3463f3a83bbb843d5c399869dbe930ad94f/htlcswitch/link.go#L2119)

To fix this we will send them a Bogus Channel Reestablish with 0 commitment_number and invalid last_per_commit_secret.

Key Changes:
  - In connect_activate_subd, if we detect a stub channel, we serialize and send a bogus channel_reestablish message.

Changelog-Fixed: Fixing LND's non responsive behaviour on receiving an error.
Make sure we are sending bogus channel reestablish after recovering from
emergency.recover file and peer storage backup.

Key Changes:
 - Add wait_for_log() with appropriate debug statements

Sort `listclosechannels` to maintain example sequence
l1.daemon.wait_for_log('peer_out WIRE_ERROR')

l2.daemon.wait_for_log('bad reestablish commitment_number: 0')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the warning sent to the peer (l2 back to l1) over the wire, but also l1 disconnects immediately, so there's no opportunity for l1 to catch it. That would mean we either need to separately log the bad reestablish in connectd and then this test would check for that log message in l2, or otherwise we assume we're sending the bogus revocation correctly, but can't confirm with this blackbox test beyond seeing that it transmitted.

lightningd-1 2025-04-30T20:47:27.026Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#1: Sending a bogus channel_reestablish message to make the peer unilaterally close the channel.
...
lightningd-1 2025-04-30T20:47:27.029Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-connectd: peer_out WIRE_CHANNEL_REESTABLISH
lightningd-1 2025-04-30T20:47:27.029Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-connectd: peer_out WIRE_ERROR
lightningd-1 2025-04-30T20:47:27.029Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-connectd: disconnect_peer
...
0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-connectd: Activating for message WIRE_CHANNEL_REESTABLISH
lightningd-2 2025-04-30T20:47:27.033Z DEBUG   0266e4598d1d3c415f572a8488830b60f7e744ed9235eb0b1ba93283b315c03518-lightningd: Telling connectd to send error 0011f7133e8b2a3e1241e601b48017f2020f1e456d1b37022912b89d61ee06508fbd008f6368616e6e656c643a207265636569766564204552524f52206368616e6e656c20663731333365386232613365313234316536303162343830313766323032306631653435366431623337303232393132623839643631656530363530386662643a20556e6b6e6f776e206368616e6e656c20666f7220574952455f4348414e4e454c5f524545535441424c495348

So I think we either need to remove this line from the test or add a separate logging event to check on the senders side (l2).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants