Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intercept HTLC forwards for JIT channels #1835
Intercept HTLC forwards for JIT channels #1835
Changes from all commits
129e1f6
3a1268e
5efc197
8fe7cbe
c1f1b78
f79ad2e
ddcd9b0
7809c55
acff8f6
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding
InterceptedId
is unique but non-random. I.e a legit HTLC sender in knowledge of the routing hint could replay it (even inadvertently) multiple times ? I think we have protection against duplicate intercepted payment L5243. However this routing hint might could leak to a non-authorized third-party that could reuse this knowledge to probe the presence of a pending HTLC in some kind of deanonymization attack (e.g "Identify LSP X is the offline sender of Alice"). I wonder if we should do some time linear processing of duplicated intercepted payment to mask this behavior, where we would automaticallyfail_intercepted_htlc()
after one min.Another direction could be to more strict on the client processing of the
incoming_shared_secret
by suggesting some client salting. Or document that any leak ofincoming_shared_secret
opens the door of deanonymization attack, or even worst ?Note, I don't remember the general rules on processing invoices, and offers improve on that so those concerns might be already existent with leaks of lambda invoices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused how an attacker would get the
incoming_shared_secret
without having the node's privkey? I agree that if the attacker had that piece of info + an intercept scid, they may be able to probe for pending intercepts via some timing attack, though note that the error returned is identical to the "don't have available channel for forwarding" error.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this would assume either a leak of the sender secret key or the routing hop secret key. In both cases, I think we can assume you're opening the door to few privacy leaks and worst. As the
InterceptId
is based on the shared secret, an entity with the leak of the intercept id shouldn't be able to hit theOccupied
entry L5256 inchannelmanager
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an internal map, of which the growth is driven by an external entity of the node. If the intercepted HTLCs are not dried-up by the user (especially if they were not expected by the application module and there is not automatic timeout of them) with
forward_intercepted_htlc()
orfail_intercepted_htlc()
it could constitute a DoS vector. Should we implement a hard map bound (MAX_PENDING_INTERCEPTED_HTLC
) in LDK node, at the price of lower processing capability, or precise the documentation ofHTLCIntercepted
thatforward_intercepted_htlc
orfail_intercepted_htlc
MUST be call after a timeout expiration, and a suggest a value for it (e.g 10min) ? We could also flush the map after X blocks reception.What I'm worried about is a third-party HTLC sender exploiting a naive implementation of a HTLC interceptor, as the implementation guidelines we're providing are too loose. What do you think ? How else could things go wrong with
pending_intercepted_htlcs
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if this isn't resolved, since as you mention below we have ee70d50. My opinion is that that commit plus the config knob is sufficient for now, but we could consider a
MAX_PENDING_INTERCEPTED_HTLC
in the future.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'm good with
5a543a0
for now. However could you add a follows-up issue with a reference toMAX_PENDING_INTERCEPTED_HTLC
as it can be a source of significant DoS and I would like to be sure we end up implementing it ?Also, I think this would be also worthy to document
accept_intercept_htlcs
with an explicit warning towards users that it could be source of DoS or deanonymization attacks, in function of their intercepting modules. Even if the default is currently false, we could have a comment like "The current intercepting API is still experimental and shenanigans from the HTLC sender can be expected. Be conservative in our usage of it".There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why this is a DoS vector at all - it can't grow more than 400-something entries per channel, entries which can always exist in our channels. Its some relatively small constant factor increase over our existing per-HTLC usage (and probably not the biggest memory usage you can get per HTLC).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, the new map is global at the
ChannelManager
-level andPendingAddHTLCInfo
->PendingHTLCInfo
->PendingHTLCRouting
->msgs::OnionPacket
, of which the current size is 1336 bytes. As of today biggest Lightning node do have something like ~2000 channels. So 2000 * 483 * 1336 = 1290576000, or said more than one 1GB, this starts to be a lot of memory, if you can find other internal maps to DoS and combine that with memory DoSing the associated Bitcoin Core node. Let me know if you have another concrete estimation for the DoS risk ?I think we should also be aware the 483 bound might be modified in the future with lightning/bolts#873 (
option_dusty_htlcs_uncounted
). The number of pending inbound HTLC might change by an order of magnitude, however I agree we're likely to revisit few more internal maps.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, mostly we need to drop the onion packet from the pending add info, as that also makes the monitor updates big. That's separate from this, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we really drop the onion packet from the pending add info? If the intercepted HTLC pursues its forward, we need to send the remaining onion less our payload to next hop, not sure we're in sync here. Though I agree this not new to this PR, as we might have already traditional HTLC blurring up our maps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracked with #1888
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand what "this" is referring to in this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it's the fact that we have some checks that only apply if the next hop is a "real" channel, so it takes less time to check fake-channel-forwards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grepping this is the first time we mention LSP in the codebase. I think it could be opportunistic to make an entry in
GLOSSARY.md
, at least to document how "we" LDK contributors understand it. The reality behind varies a lot across the ecosystem and it could be good to have a well-defined term to make sense of API like this one.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not opposed but sounds a bit bikeshed-y, would prefer to punt for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could say "not handled in time (
HTLC_FAIL_BACK_BUFFER
)"There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HTLC_FAIL_BACK_BUFFER
isn'tpub
, thoughThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah neither
do_chain_event()
. Okay I find nice to give hints to the user where to look for in case of issues.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a question to check my understanding:
Is this basically the procedure for an LDK user?
ChannelManager::get_intercept_scid
HTLCIntercepted
event when the HTLC is interceptedThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's my understanding, where the LDK user is the LSP in your example (
nodes[1]
in the test), and the end-user isnodes[2]
in the test. I guess step (2) in your example doesn't necessarily need to be on the LDK user's side, as long as the Intercept scid is passed to the end-user.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I should have read the test.
Yeah I think I phrased that weirdly for (2). End user generates invoice but just shoves the scid they got from the LSP in the route hints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have
APIError::ChannelUnavailable
in fact, could be used.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mutating this line does not break
intercepted_payment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this value choice of an onion error message be standardized, otherwise if Eclair use PERM|1 (
invalid_realm
) and LND use UPDATE|20 (channel_disabled
), we could have some implementation fingerprint in the support of offline-receive.Do you think it could be worthy to introduce a new PERM|23 (
custom_routing_policy_unsatisfied
) at the spec-level as a catch-all for HTLC intercepting application ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are easier ways to fingerprint implementations as-is, but it would be good for the LSP spec (which this feature is based on) to specify what error should be returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can add a comment, suggesting the error could be a custom one and as a reminder to update if/when the LSP spec adopts one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense to do anything else here, and it definitely doesn't make sense to have a custom error. The error we return here absolutely should be the same error as if we didn't have intercepts enabled, which makes unknown_next_peer correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After understanding this PR is only about JIT channel, I think effectively we shouldn't introduce a custom error. We can revisit in the future, when offline receive at the receiver's LSP or things like "delay my HTLC", introduces HTLC delay processing and it makes sense to return new onion errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not for this PR, but I've wondered whether we should collect failure code consts somewhere so it's easy to see what it is. I can create an issue if we'd want that, unless I missed some prior reasoning not to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that would be nice to have!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "We don't want to generate a PendingHTLCsForwardable event if only intercepted payments are being processed. Each intercepted HTLC generates its own
Event::HTLCIntercepted
. However a single directly forwardable HTLC should generate aPendingHTLCsForwardable
."There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code comment best practices tend to prescribe not duplicating what the code is doing, which is how those additions read to me IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this even because we want old versions reading this to fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! Otherwise, the pending intercept HTLCs would never be handled and channels would force close