Implement quiescence protocol #3588

wpaulino · 2025-02-04T18:45:40Z

Quiescence is a new protocol feature that allows for channels to undergo "fundamental" changes (i.e., protocol upgrade) while there are no pending updates on either side. Its first use case will be to carry out channel splices, to ensure new HTLC/fee updates are not made while a splice is being negotiated.

Each side of the channel is allowed to send a stfu message if any of their outbound updates are not pending for either side (i.e., irrevocably committed on both commitment transactions). Once both sides exchange stfu, the channel becomes quiescent. A message timeout is enforced during the quiescence handshake to ensure we can eventually re-establish the channel and propose new HTLC/fee updates again.

Several new state flags have been added to ChannelState::ChannelReady to track the progress of the quiescence handshake. Once the channel becomes quiescent, all flags related to the handshake are cleared, and the QUIESCENT flag is enabled. While quiescence is not a persistent protocol (it implicitly terminates upon peer disconnection), and updates cannot be made, we still need to track MONITOR_UPDATE_IN_PROGRESS as it may be required by the quiescence-dependent protocol, like in the case of splicing.

codecov · 2025-02-04T18:59:04Z

Codecov Report

Attention: Patch coverage is 87.95580% with 109 lines in your changes missing coverage. Please review.

Project coverage is 88.56%. Comparing base (6cf270d) to head (570ddae).
Report is 8 commits behind head on main.

Files with missing lines	Patch %	Lines
lightning/src/ln/channelmanager.rs	70.22%	45 Missing and 8 partials ⚠️
lightning/src/ln/channel.rs	83.66%	47 Missing and 3 partials ⚠️
lightning/src/ln/quiescence_tests.rs	98.54%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3588      +/-   ##
==========================================
- Coverage   88.61%   88.56%   -0.06%     
==========================================
  Files         149      150       +1     
  Lines      117091   117925     +834     
  Branches   117091   117925     +834     
==========================================
+ Hits       103766   104446     +680     
- Misses      10816    10954     +138     
- Partials     2509     2525      +16

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wpaulino · 2025-02-07T00:58:38Z

This no longer depends on #3573. I've included an alternative approach here that @TheBlueMatt and I discussed offline.

vincenzopalazzo · 2025-02-07T14:53:09Z

lightning/src/ln/channelmanager.rs

+		let peer_state_mutex = per_peer_state.get(counterparty_node_id).ok_or_else(|| {
+			debug_assert!(false);
+			MsgHandleErrInternal::send_err_msg_no_close(
+				format!("Can't find a peer matching the passed counterparty node_id {}", counterparty_node_id),


probably just my reading preference, but we could use format!("Can't find a peer matching the passed counterparty node_id { counterparty_node_id}"),

lightning/src/ln/channel.rs

vincenzopalazzo

LGTM, I left some comments along the way

ACK 047ddd3

lightning/src/ln/channel.rs

TheBlueMatt

Thanks! How does this interact with async signing? It seems like it should consider async signing pending cases in the decision to send stfu/move to quiescent, no?

lightning/src/ln/channel.rs

wpaulino · 2025-02-11T18:25:25Z

How does this interact with async signing? It seems like it should consider async signing pending cases in the decision to send stfu/move to quiescent, no?

Yeah, should just need to also check for signer_pending_revoke_and_ack/commitment_update in has_pending_channel_update. I'll add some test coverage as well.

wpaulino · 2025-02-12T02:29:34Z

@TheBlueMatt addressed all your comments, should be ready for review again. Since things changed a good bit, I started fuzzing the new HEAD.

lightning/src/ln/channel.rs

TheBlueMatt · 2025-02-12T20:48:54Z

lightning/src/ln/channel.rs

@@ -6960,6 +7096,14 @@ impl<SP: Deref> FundedChannel<SP> where
 		}
 		assert!(!matches!(self.context.channel_state, ChannelState::ShutdownComplete));

+		// TODO: The spec is pretty vague regarding the handling of shutdown within quiescence.
+		if self.context.channel_state.is_local_stfu_sent()


We might have sent the stfu but they sent the shutdown before they received it. I dont think it makes sense to check this.

We're still disconnecting so it should be fine? They can retry the shutdown after reconnecting since it'll be like the stfu never happened (unless it's also resent of course). This is really just a placeholder until we decide what to do at the spec level anyway.

I feel like that might just get us into a loop instead - reconnect, we resent stfu, they resend shutdown, disconnect, .... IMO we should handle this instead, abort the stfu/thing we want to do and just shutdown.

Yeah, would be nice to avoid that. Maybe we leave this as follow-up work then? We don't have retrying logic yet, and I'd hate to have to implement all of this, just for it to be undone because the spec chose to go in a different direction.

TheBlueMatt · 2025-02-12T20:54:06Z

lightning/src/ln/channel.rs

+		if !self.context.is_live() {
+			return Err(ChannelError::Ignore("Channel is not in a live state to propose quiescence".to_owned()));
+		}
+		if self.context.channel_state.is_quiescent() {


Shouldn't we also fail if the counterparty has sent stfu (as that means they'll get the talking stick after we went quiescent, but the caller almost certainly wanted to go quiescent so that they can get the talking stick and do something).

Absolutely. I was planning to leave it for follow-up work when integrating with splicing since we'd have a better idea how propose_quiescence should be designed when not used in a test-only scenario. We have to continue retrying (up to some timeout?) propose_quiescence until we're the initiator to get a locally requested splice through.

Absolutely. I was planning to leave it for follow-up work when integrating with splicing since we'd have a better idea how propose_quiescence should be designed when not used in a test-only scenario.

Seemed like we have a pretty reasonable idea already, no?

We have to continue retrying (up to some timeout?) propose_quiescence until we're the initiator to get a locally requested splice through.

If we want to retry quiescence until it works we need a lot more tracking and logic in a few places, including probably new states, no? I don't think the current code is conducive to that at all and was assuming we wouldn't (or would do it at a higher level)? For splice specifically we can add our stuff to the splice the counterparty initiates too so there would presumably be no need.

Seemed like we have a pretty reasonable idea already, no?

Well that was just one idea. I haven't given it too much thought, and trying to do it all in one PR seems overkill.

If we want to retry quiescence until it works we need a lot more tracking and logic in a few places, including probably new states, no? I don't think the current code is conducive to that at all and was assuming we wouldn't (or would do it at a higher level)?

Correct, this PR just serves as a base implementation of the protocol. The need to retry only becomes relevant when trying to carry out a user's request to do "something fundamental" (e.g., a splice). All the tracking and logic would operate at a higher level, unless you're proposing it shouldn't?

For splice specifically we can add our stuff to the splice the counterparty initiates too so there would presumably be no need.

Sure, if that opportunity is available we should take advantage of it, but we can't always rely on it.

Correct, this PR just serves as a base implementation of the protocol. The need to retry only becomes relevant when trying to carry out a user's request to do "something fundamental" (e.g., a splice). All the tracking and logic would operate at a higher level, unless you're proposing it shouldn't?

Sure, that's fine, but if we're gonna do retries at a higher level then we probably want to go ahead and write the API here to be fallible and let the higher level do retries :p

Let's just change it when we get there? This is just a test helper for now really

I'm gonna forget by then, but okay 🤷‍♂️

If we do forget, shipping retries with zero tests at all (which would immediately catch this) would be pretty irresponsible...

lightning/src/ln/channel.rs

jkczyz

Made it through the first three commits. Looks good so far. Will continue reviewing but figured I send out some minor comments in the meanwhile.

lightning/src/ln/channel.rs

lightning/src/ln/quiescence_tests.rs

lightning/src/ln/channel.rs

lightning/src/ln/quiescence_tests.rs

fuzz/src/chanmon_consistency.rs

The existing `ChannelError::Warn` variant only sends the warning and does not disconnect. There are certain cases where we want to just send a warning, and other cases where we want to also disconnect, so we keep both variants around.

Quiescence is a new protocol feature that allows for channels to undergo "fundamental" changes (i.e., protocol upgrade) while there are no pending updates on either side. Its first use case will be to carry out channel splices, to ensure new HTLC/fee updates are not made while a splice is being negotiated. Each side of the channel is allowed to send a `stfu` message if any of their outbound updates are not pending for either side (i.e., irrevocably committed on both commitment transactions). Once both sides exchange `stfu`, the channel becomes quiescent. A message timeout is enforced during the quiescence handshake to ensure we can eventually re-establish the channel and propose new HTLC/fee updates again. Several new state flags have been added to `ChannelState::ChannelReady` to track the progress of the quiescence handshake. Once the channel becomes quiescent, all flags related to the handshake are cleared, and the `QUIESCENT` flag is enabled. While quiescence is not a persistent protocol (it implicitly terminates upon peer disconnection), and updates cannot be made, we still need to track `MONITOR_UPDATE_IN_PROGRESS` as it may be required by the quiescence-dependent protocol, like in the case of splicing.

We previously would avoid freeing our holding cells upon a `revoke_and_ack` if a monitor update was in progress, which we checked explicitly. With quiescence, if we've already sent `stfu`, we're not allowed to make further commitment updates, so we must also avoid freeing our holding cells in such cases. Along the way, we also remove the special handling of in-progress monitor updates now that it behaves the same as the handling of being quiescent.

With the introduction of `has_pending_channel_update`, we can now determine whether any messages are owed to irrevocably commit HTLC updates based on the current channel state. We prefer using the channel state, over manually tracking as previously done, to have a single source of truth. We also gain the ability to expect to receive multiple messages at once, which will become relevant with the quiescence protocol, where we may be waiting on a counterparty `revoke_and_ack` and `stfu`.

Since new updates are not allowed during quiescence (local updates enter the holding cell), we want to ensure quiescence eventually terminates if the handshake takes too long or our counterparty is uncooperative. Disconnecting implicitly terminates quiescence, so the holding cell can be freed upon re-establishing the channel (assuming quiescence is not requested again).

wpaulino · 2025-02-20T18:25:42Z

Rebased due to conflicts and added some more comments to the test you requested @jkczyz.

jkczyz

LGTM

TheBlueMatt

🎉

jkczyz self-requested a review February 4, 2025 19:48

wpaulino force-pushed the quiescence branch from 897d45f to e37ba37 Compare February 5, 2025 22:31

dunxen self-requested a review February 6, 2025 15:24

wpaulino force-pushed the quiescence branch from e37ba37 to e028f74 Compare February 7, 2025 00:56

wpaulino added the weekly goal Someone wants to land this this week label Feb 7, 2025

wpaulino force-pushed the quiescence branch from e028f74 to 047ddd3 Compare February 7, 2025 01:06

vincenzopalazzo reviewed Feb 7, 2025

View reviewed changes

lightning/src/ln/channel.rs Outdated Show resolved Hide resolved

vincenzopalazzo approved these changes Feb 7, 2025

View reviewed changes

vincenzopalazzo reviewed Feb 7, 2025

View reviewed changes

lightning/src/ln/channel.rs Show resolved Hide resolved

TheBlueMatt reviewed Feb 11, 2025

View reviewed changes

wpaulino force-pushed the quiescence branch from 047ddd3 to 1d849cc Compare February 12, 2025 02:27

TheBlueMatt reviewed Feb 12, 2025

View reviewed changes

wpaulino force-pushed the quiescence branch from 1d849cc to f5b9cb9 Compare February 12, 2025 23:44

jkczyz reviewed Feb 13, 2025

View reviewed changes

wpaulino force-pushed the quiescence branch from f5b9cb9 to 5da9a83 Compare February 13, 2025 03:07

jkczyz reviewed Feb 14, 2025

View reviewed changes

lightning/src/ln/quiescence_tests.rs Show resolved Hide resolved

lightning/src/ln/quiescence_tests.rs Outdated Show resolved Hide resolved

fuzz/src/chanmon_consistency.rs Show resolved Hide resolved

wpaulino added 7 commits February 20, 2025 10:22

Change Stfu::initiator to bool type

5d6e759

Add ChannelError::WarnAndDisconnect variant

506367e

The existing `ChannelError::Warn` variant only sends the warning and does not disconnect. There are certain cases where we want to just send a warning, and other cases where we want to also disconnect, so we keep both variants around.

Add fuzzing coverage for quiescence

570ddae

wpaulino force-pushed the quiescence branch from 5da9a83 to 570ddae Compare February 20, 2025 18:24

jkczyz approved these changes Feb 20, 2025

View reviewed changes

TheBlueMatt approved these changes Feb 21, 2025

View reviewed changes

TheBlueMatt merged commit 609f89d into lightningdevkit:main Feb 21, 2025
24 of 26 checks passed

wpaulino deleted the quiescence branch February 21, 2025 17:02

optout21 mentioned this pull request Feb 21, 2025

[Splicing] Partial, handle splice_init & splice_ack messages #3407

Merged

This was referenced Apr 6, 2025

Socratic Seminar #30 Topics NashBitDevs/nashbitdevs.org#82

Closed

DC BitDevs 014 Topic Suggestion dcbitdevs/dcbitdevs.github.io#24

Closed

wpaulino mentioned this pull request Apr 10, 2025

Avoid disconnect on message timeout while waiting on monitor/signer #3721

Merged

jkczyz mentioned this pull request Jun 21, 2025

Dual-funded channels and Splicing Project Tracking #1621

Open

7 tasks

Implement quiescence protocol #3588

Implement quiescence protocol #3588

Uh oh!

Conversation

wpaulino commented Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

wpaulino commented Feb 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vincenzopalazzo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wpaulino commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wpaulino commented Feb 12, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jkczyz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wpaulino commented Feb 20, 2025

Uh oh!

wpaulino commented Feb 4, 2025 •

edited

Loading

codecov bot commented Feb 4, 2025 •

edited

Loading

wpaulino commented Feb 11, 2025 •

edited

Loading

TheBlueMatt Feb 13, 2025 •

edited

Loading