multi: Switch store modifications + prevent duplicates in SendHTLC #9777

calvinrzachman · 2025-04-30T18:07:58Z

Change Description

GOAL: Prevent duplicate payment attempts and unintentional loss of funds when using the SendOnion/TrackOnion RPCs to dispatch and track payments. We intend to use the ChannelRouter in a remote process from lnd and the Switch, but this discussion would apply generally to any client looking to use these RPC endpoints.

We have already explored an approach which prevents this using the SendOnion/TrackOnion implementations themselves and requires little to no changes to Switch or ChannelRouter logic. The approach is demonstrated in #9489. Both approaches are summarized there in the Avoiding Duplicate Payment Attempts section.

We expand the ChannelRouter to handle new failure modes when SendHTLC is implemented using RPCs over a network - as is the case when Router and Switch run in separate processes.

The ChannelRouter needs to distinguish between an attempt that was initiated successfully and an attempt for which it is not known whether it initiated successfully. It must attempt to track the result only for attempts which are known to have been initiated successfully. Otherwise, since we base retry decision of the result of attempt tracking, we risk duplicate attempts being made if SendHTLC is implemented via RPC between two processes communicating over async network.

Here “initiated successfully” means an explicit response/acknowledgement from the backend server processing the request. Network/gRPC Errors alone do not appear sufficient to make this determination.
NOTE: This requires an lnd change. It allows us to handle a class of failure mode that is not present when Router + Switch run in the same process. The acknowledgement is not needed for correct function when the Router + Switch run in the same process - because if SendHTLC errors, the HTLC is guaranteed to not have been nor will it be in-flight.

Originally, I thought that this approach would require the persisting of the acknowledgement from the Switch server using something like:

Persisted Server Acknowledgment of HTLC Receipt & Dispatch + Modified ChannelRouter Startup

This approach involves idempotent SendOnion implementation combined with an RPC client which persists acknowledgement of successful onion/HTLC receipt and dispatch from the server. The client can wait, retrying if necessary, until it gets an explicit acknowledgment from the server about the HTLC’s status before ever calling TrackOnion.
- To handle restarts, it must persist this acknowledgement and differentiate between ACK’d and UNACK’d attempts, handling them differently on startup. NOTE: This would require ChannelRouter changes for how we intend to use these RPCs ⚠️
- NOTE: This PR currently takes this approach!

This sequence—RegisterAttempt, SendHTLC, and AcknowledgeAttempt—is helpful in a distributed environment to:

Persist the intent to send HTLCs (RegisterAttempt).
Execute the actual HTLC delivery (SendHTLC).
Resolve ambiguity about the HTLC’s state (AcknowledgeAttempt) in a manner that survives restarts.

On restart, query the ControlTower for in-flight attempts:

Acknowledged Attempts: Launch resultCollector (calls GetAttemptResult) to track these attempts.
Non-Acknowledged Attempts: Retry SendHTLC to resolve ambiguity.

NOTE: GetAttemptResult CANNOT be used to resolve the ambiguity when communication happens over the network!

This is roughly demonstrated in calvinrzachman#14.

This PR explores a modification to the above approach to minimize the necessary ChannelRouter changes. To accomplish this we update the contract or assurance offered by the SendHTLC() method by making it safe to call multiple times with the same attempt ID. This allows us to to guarantee that:

When SendHTLC returns, we must be certain that either the HTLC is in-flight (no error)
and thus can be tracked OR that the HTLC is not in-flight and won't ever be in flight and 
thus can be re-attempted.

With this assurance we safely and defensively call SendHTLC while resuming payments on startup. This should work both in the case where Router and Switch run in the same process (as is the case for the large majority of lnd deployments) and in the scenario where Router and Switch run remotely.

The SwitchRPC server will be hidden behind a build tag.

Add RPC for dispatching payments via onions. The payment route and onion are computed by the caller and the onion is delivered to the server for forwarding. NOTE: The server does NOT process or peel the onion so it assumed that the onion will be constructed such that the first hop is encrypted to one of the server's channel partners.

Allow the switch to defer error handling when callers of GetAttemptResult do not provide an error decrypter.

Add RPC to lookup the status of a previously forwarded onion. Allow callers of the TrackOnion rpc to indicate whether they would like to handle errors themselves or delegate error decryption to the server. We take care to return ErrPaymentIDNotFound across RPC boundary to the RPC caller. This will allow the caller of TrackOnion to explicitly confirm that there is no HTLC in-flight for the supplied attempt ID, so it is free to safely re-attempt the payment.

Add RPC which constructs a sphinx onion packet for the given payment route. NOTE: This is added primarily to aid with the itests added later.

This demonstrates how the Switch and SendOnion rpc behave when asked to dispatch duplicate onions. Notably, the Switch circuit map detects this - but only if the matching onion is still in flight. Once the circuit is torn down, the duplicate is permitted by the Switch. It is likely that we will add a layer of protection to the SendOnion call itself to prevent duplicates even after the first HTLC is no longer in-flight. TODO: Determine whether this SendOnion duplication protection should presist across restarts.

This allows users of the SendOnion RPC to include all fields that we support using with the UpdateAddHtlc type.

Preperatory refactor to allow for future alteration of the store backing the Switch.

TODO: is this needed after pushing pending result handling deeper into the store and InitAttempt within SendHTLC rather than SendOnion?

Add a new message for pending htlc attempt results in the Switch's network result store. This serves as a place holder to use during initialization of an attempt within the store which will be replaced with either a SETTLE or FAIL message once the HTLC attempt result is received from the network. NOTE: This message is not sent or received externally to channel peers. It was introduced only to avoid having to change the on-disk structure of the network result store.

This can be used to initialize the result store for a given attempt ID prior to sending the HTLC out to the network.

We're seeing what benefit to upstream clients is provided if the underlying SendHTLC implementatation (and by extension the SendOnion RPC) will not forward the same attempt ID twice without the result for a given ID having been cleaned from the result store. We accomplish the duplicate prevention using the InitAttempt method of the Switch Store.

We can now assert that making multiple calls to SendOnion for the same attempt ID is prevented.

We need to ensure that each of our test nodes are aware of the necessary channels. Not sure if AssertChannelActive or AssertChannelInGraph is better for this purpose.

This allows for the resolving of any ambiguity in the status of an HTLC in the case where the ChannelRouter and Switch run in separate processes and communicate over the network. When SendHTLC returns it is expected that the Router know, without any uncertainty whether the HTLC is in-flight (no error) or that it is neither in flight nor will it ever be in flight. This is tricky to guarantee when the communication happens over a network. Instead we allow the server to resolve any uncertainty by calling SendHTLC while resuming a payment prior to attempting to track the result. If the HTLC was successfully received by the remote Switch, then the Router will receive a duplicate HTLC error and can proceed to tracking the attempt result like normal.

coderabbitai · 2025-04-30T18:08:05Z

Important

Review skipped

Auto reviews are limited to specific labels.

🏷️ Labels to auto review (1)

llm-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

calvinrzachman · 2025-04-30T18:08:32Z

Mistakenly opened this. My apologies.

calvinrzachman added 28 commits April 28, 2025 20:25

switchrpc: configuration

f8d9f87

The SwitchRPC server will be hidden behind a build tag.

switchrpc: add logger

e32d671

switchrpc: add sub-server dependencies

ea04925

switchrpc: subserver functionality

76b0b47

switchrpc: add new SendOnion rpc proto

a51e676

htlcswitch: return encrypted error blob when missing deobfuscator

7b64df9

Allow the switch to defer error handling when callers of GetAttemptResult do not provide an error decrypter.

switchrpc: add new TrackOnion rpc proto

6bd73e2

switchrpc: add new BuildOnion rpc proto

0854be5

switchrpc: add BuildOnion rpc

096fb34

Add RPC which constructs a sphinx onion packet for the given payment route. NOTE: This is added primarily to aid with the itests added later.

lntest: add switchrpc methods to harness

141390f

itest: add send_onion test

a299230

itest: add track_onion test

6cd3050

temp: update SendOnion proto to support chan_id

00f734f

temp: allow caller to provide chan_id to SendOnion

84a39e3

temp: update send onion itest to use chan id

19c643b

switchrpc: support all fields from UpdateAddHtlc

3603b51

This allows users of the SendOnion RPC to include all fields that we support using with the UpdateAddHtlc type.

switchrpc: pass through extra update_add_htlc fields

ff16e55

htlcswitch: add SwitchStore interface

cdbc59e

Preperatory refactor to allow for future alteration of the store backing the Switch.

switchrpc: use new switch store

83456a8

TODO: is this needed after pushing pending result handling deeper into the store and InitAttempt within SendHTLC rather than SendOnion?

htlcswitch: add InitAttempt method to switch store

425eacb

This can be used to initialize the result store for a given attempt ID prior to sending the HTLC out to the network.

itest: update sendonion itest given duplicate protection

c8a29c5

We can now assert that making multiple calls to SendOnion for the same attempt ID is prevented.

itest: fix race in send_onion itest

be9d20a

We need to ensure that each of our test nodes are aware of the necessary channels. Not sure if AssertChannelActive or AssertChannelInGraph is better for this purpose.

calvinrzachman closed this Apr 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

multi: Switch store modifications + prevent duplicates in SendHTLC #9777

multi: Switch store modifications + prevent duplicates in SendHTLC #9777

Uh oh!

calvinrzachman commented Apr 30, 2025

Uh oh!

coderabbitai bot commented Apr 30, 2025

Review skipped

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

calvinrzachman commented Apr 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

multi: Switch store modifications + prevent duplicates in SendHTLC #9777

multi: Switch store modifications + prevent duplicates in SendHTLC #9777

Uh oh!

Conversation

calvinrzachman commented Apr 30, 2025

Change Description

Uh oh!

coderabbitai bot commented Apr 30, 2025

Review skipped

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

calvinrzachman commented Apr 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CodeRabbit Configuration File (`.coderabbit.yaml`)