-
Notifications
You must be signed in to change notification settings - Fork 427
Support generic HTLC interception #4300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Support generic HTLC interception #4300
Conversation
While its nice to document things, the lockorder comment at the top of `ChannelManager` is just annoying to always update and doesn't add all that much value. Developers likely shouldn't be checking it while writing code, our automated lockorder issue detection framework more than suffices to catch any bugs in test-reachable code. That makes it basically write-only which isn't exactly a useful comment.
When we added support for async payments (which requires holding HTLCs until we receive an onion message), we added the hold logic to `ChannelManager::forward_htlcs`. This made sense as we reused the forwarding datastructure in the holding logic so already had the right types in place, but it turns out only a single call of `forward_htlcs` should ever result in an HTLC being held. All of the other calls (un-holding an HTLC, forwarding an intercepted HTLC, forwarding an HTLC decoded by LDK prior to 0.2, or processing a phantom receive) should never result in an HTLC being held. Instead, HTLCs should actually only ever be held when the HTLC is decoded in `process_pending_update_add_htlcs` before forwarding. Because of this, and because we want to move the interception (and thus also the holding logic) out of `forward_htlcs`, here we move the holding logic into `process_pending_update_add_htlcs`.
If a peer is offline, but only recently went offline and thus the channel has not yet been marked disabled in our gossip, we should be returning `LocalHTLCFailureReason::PeerOffline` rather than `LocalHTLCFailureReason::ChannelNotReady`. Here we fix the error returned and tweak documentation to make the cases clearer.
At various points we've had requests to support more generic HTLC interception in LDK. In most cases, full HTLC interception was not, in fact, the right way to accomplish what the developer wanted, but there have been various times when it might have been. Here, we finally add full HTLC interception support, doing so with a configurable bitfield to allow developers to intercept only certain classes of HTLCs. Specifically, we currently support intercepting HTLCs: * which were to be forwarded to intercept SCIDs (as was already supported), * which were to be forwarded to offline private channels (for LSPs to accept HTLCs for offline clients so that they can attempt to wake them before failing the HTLC), * which were to be forwarded to online private channels (for LSPs to take additional fees or enforce certain policies), * which were to be forwarded over public channels (for general forwarding policy enforcement), * which were to be forwarded to unknown SCIDs (for everything else). Similar to a few commits ago, this requires inlining the HTLC interception logic from `forward_htlcs` to its callsites, specifically `process_pending_update_add_htlcs` and `handle_release_held_htlc`. Note that we do not handle HTLC interception when forwarding an HTLC which was decoded in LDK versions prior to 0.2, which is noted in a suggested release note.
|
👋 Thanks for assigning @tnull as a reviewer! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4300 +/- ##
==========================================
- Coverage 89.38% 86.54% -2.84%
==========================================
Files 180 158 -22
Lines 139834 101861 -37973
Branches 139834 101861 -37973
==========================================
- Hits 124985 88159 -36826
+ Misses 12262 11283 -979
+ Partials 2587 2419 -168
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, in #3843 and other prior contexts we had discussed that we might want to move away from Event-based interception to a dedicated handler or trait based interface. What do you think about going that way instead? Do you think it would still be easily possible with the approach you took here?
joostjager
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No review yet, just posting form comments.
| // `ChannelManager`. Locks can be held at the same time if they are on the same branch in the tree, | ||
| // and should then be taken in the order of the lowest to the highest level in the tree. | ||
| // Note that locks on different branches shall not be taken at the same time, as doing so will | ||
| // create a new lock order for those specific locks in the order they were taken. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually found this a useful explanation. Agreed that it isn't ideal to keep this updated. Can't the commented by formulated in a generic way? Something along the lines of what you put in the commit msg maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe something in CONTRIBUTING.md? Its not specific to channelmanager.rs or ChannelManager mutexes, generally we kinda just expect devs to "put mutexes where they're needed, make sure you have test coverage, and only worry about lockorder if the test suite complains".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CONTRIBUTING.md seems to describe process rather than architecture. In #4296 I am proposing to add README.md for the lightning crate. It could go there perhaps.
Removing every explanation about this, not sure about that. But if other reviewers agree that it is redundant, I guess it's ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CONTRIBUTING.md is important things that contributors need to know, README.md is high-level things that downstream devs need to know. This is definitely not something that should go in README.md, IMO, as downstream devs shouldn't need to think about this at all.
| }); | ||
| let shared_secret = next_hop.shared_secret().secret_bytes(); | ||
|
|
||
| macro_rules! fail_htlc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no way to avoid this? Macros are so annoying in navigation and debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be a function but then it can't include the continue at the end, which materially reduces the chance of screwing it up (you can't obviously see that there's a missing continue just by looking at the callsite). Given its local to a function and trivial, I don't see much issue with the macro vs trying to use a function and taking the extra chance of future bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue the opposite. A continue hidden inside a macro actually hurts readability at the call site.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can rename the macro fail_htlc_continue if you prefer.
| fn can_forward_htlc( | ||
| &self, msg: &msgs::UpdateAddHTLC, next_packet_details: &NextPacketDetails | ||
| ) -> Result<(), LocalHTLCFailureReason> { | ||
| fn can_forward_htlc_intercepted( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit is quite large. Is it possible to do a pre-factor commit that doesn't add any new functionality yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh? The commit is +278 in tests and +127 in (mostly) documentation for the new enum, leaving a relatively small change to channelmanager.rs. I don't know how to break up the channelmanager.rs changes in a useful way (the only way I see is to do all the changes but implement forward_needs_intercept to only return true if its a valid intercept, which is a trivial amount of change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can never make it too easy for reviewers. Reading one's own changes vs other people's changes is a pretty different experience.
I think that first doing a refactor that keeps the boolean and adding the enum in a second commit is going to make the diff better readable. I did a bit of Clauding to see whether it could do it: main...joostjager:rust-lightning:2025-12-full-interception-split-commits
I'm not sure if its necessary. I do think that we need to not have a simple boolean "intercept everything" vs "intercept nothing", but the flags approach here seemed simple and I think likely covers the types of decisions a "do you want to consider intercepting this" trait would make. Its possible we're missing some granularity, but likely not too much (maybe the one thing would be to intercept things coming from private channels so that you can give them different fees?). Now the alternative trait would be one that could return intercept/forward/reject, which there might be some argument for here, mostly for cases where someone wants to limit HTLCs in some cases (eg policies like "only allow forwards to channels with limit liquidity from my private channels/LSP clients, but allow forwards to channels with more liquidity from anyone") where avoiding the event pipeline would be nice. That said, currently we're basically entirely single-threaded when forwarding no matter what - it all happens in a single method called from the BP. Pushing forwards through the event pipeline just makes forwarding decisions natively-async, it doesn't actually slow them down anymore than they already were. Now, there's likely some future case where we want to parallelize forwards a bit and move some of the forwarding work into something that's called in parallel (no idea what that would be? Maybe just fetching only some fresh commitment transactions for peers in a
I don't believe it would be a major change, no, just more boilerplate. |
|
I'd say that moving to a non-event-based interface would be too much scope creep. The flags-based approach is the simplest way to enable the requested functionality? |
For LSPS5 (and, generally, being an LSP), we need the ability to intercept and hold HTLCs destined for offline private channels, attempt to wake them and forward the HTLC after they connect (assuming they do). While we're adding interception for additional destinations, there's not any additional complexity to just make it a bitmask and support arbitrary interception, so we do that as well.