Skip to content

Conversation

@calvinrzachman
Copy link
Contributor

Change Description

GOAL: Prevent duplicate payment attempts and unintentional loss of funds when using the SendOnion/TrackOnion RPCs to dispatch and track payments. We intend to use the ChannelRouter in a remote process from lnd and the Switch, but this discussion would apply generally to any client looking to use these RPC endpoints.

We have already explored an approach which prevents this using the SendOnion/TrackOnion implementations themselves and requires little to no changes to Switch or ChannelRouter logic. The approach is demonstrated in #9489. Both approaches are summarized there in the Avoiding Duplicate Payment Attempts section.

We expand the ChannelRouter to handle new failure modes when SendHTLC is implemented using RPCs over a network - as is the case when Router and Switch run in separate processes.

The ChannelRouter needs to distinguish between an attempt that was initiated successfully and an attempt for which it is not known whether it initiated successfully. It must attempt to track the result only for attempts which are known to have been initiated successfully. Otherwise, since we base retry decision of the result of attempt tracking, we risk duplicate attempts being made if SendHTLC is implemented via RPC between two processes communicating over async network.

  • Here “initiated successfully” means an explicit response/acknowledgement from the backend server processing the request. Network/gRPC Errors alone do not appear sufficient to make this determination.
  • NOTE: This requires an lnd change. It allows us to handle a class of failure mode that is not present when Router + Switch run in the same process. The acknowledgement is not needed for correct function when the Router + Switch run in the same process - because if SendHTLC errors, the HTLC is guaranteed to not have been nor will it be in-flight.

Originally, I thought that this approach would require the persisting of the acknowledgement from the Switch server using something like:

Persisted Server Acknowledgment of HTLC Receipt & Dispatch + Modified ChannelRouter Startup

  • This approach involves idempotent SendOnion implementation combined with an RPC client which persists acknowledgement of successful onion/HTLC receipt and dispatch from the server. The client can wait, retrying if necessary, until it gets an explicit acknowledgment from the server about the HTLC’s status before ever calling TrackOnion.
    • To handle restarts, it must persist this acknowledgement and differentiate between ACK’d and UNACK’d attempts, handling them differently on startup. NOTE: This would require ChannelRouter changes for how we intend to use these RPCs ⚠️
    • NOTE: This PR currently takes this approach!

This sequence—RegisterAttempt, SendHTLC, and AcknowledgeAttempt—is helpful in a distributed environment to:

  1. Persist the intent to send HTLCs (RegisterAttempt).
  2. Execute the actual HTLC delivery (SendHTLC).
  3. Resolve ambiguity about the HTLC’s state (AcknowledgeAttempt) in a manner that survives restarts.

On restart, query the ControlTower for in-flight attempts:

  1. Acknowledged Attempts: Launch resultCollector (calls GetAttemptResult) to track these attempts.
  2. Non-Acknowledged Attempts: Retry SendHTLC to resolve ambiguity.
  • NOTE: GetAttemptResult CANNOT be used to resolve the ambiguity when communication happens over the network!

This is roughly demonstrated in calvinrzachman#14.

This PR explores a modification to the above approach to minimize the necessary ChannelRouter changes. To accomplish this we update the contract or assurance offered by the SendHTLC() method by making it safe to call multiple times with the same attempt ID. This allows us to to guarantee that:

When SendHTLC returns, we must be certain that either the HTLC is in-flight (no error)
and thus can be tracked OR that the HTLC is not in-flight and won't ever be in flight and 
thus can be re-attempted.

With this assurance we safely and defensively call SendHTLC while resuming payments on startup. This should work both in the case where Router and Switch run in the same process (as is the case for the large majority of lnd deployments) and in the scenario where Router and Switch run remotely.

The SwitchRPC server will be hidden behind a build tag.
Add RPC for dispatching payments via onions. The payment
route and onion are computed by the caller and the onion
is delivered to the server for forwarding.

NOTE: The server does NOT process or peel the onion so it assumed
that the onion will be constructed such that the first hop is encrypted
to one of the server's channel partners.
Allow the switch to defer error handling when callers of GetAttemptResult
do not provide an error decrypter.
Add RPC to lookup the status of a previously forwarded
onion. Allow callers of the TrackOnion rpc to indicate
whether they would like to handle errors themselves or
delegate error decryption to the server.

We take care to return ErrPaymentIDNotFound across RPC
boundary to the RPC caller. This will allow the caller
of TrackOnion to explicitly confirm that there is no HTLC
in-flight for the supplied attempt ID, so it is free to
safely re-attempt the payment.
Add RPC which constructs a sphinx onion packet for the
given payment route.

NOTE: This is added primarily to aid with the itests added later.
This demonstrates how the Switch and SendOnion rpc
behave when asked to dispatch duplicate onions. Notably,
the Switch circuit map detects this - but only if the
matching onion is still in flight. Once the circuit is
torn down, the duplicate is permitted by the Switch.

It is likely that we will add a layer of protection to the
SendOnion call itself to prevent duplicates even after the
first HTLC is no longer in-flight.

TODO: Determine whether this SendOnion duplication protection
should presist across restarts.
This allows users of the SendOnion RPC to include all
fields that we support using with the UpdateAddHtlc type.
Preperatory refactor to allow for future alteration
of the store backing the Switch.
TODO: is this needed after pushing pending
result handling deeper into the store and
InitAttempt within SendHTLC rather than SendOnion?
Add a new message for pending htlc attempt results
in the Switch's network result store. This serves as
a place holder to use during initialization of an
attempt within the store which will be replaced with
either a SETTLE or FAIL message once the HTLC attempt
result is received from the network.

NOTE: This message is not sent or received externally
to channel peers. It was introduced only to avoid having
to change the on-disk structure of the network result
store.
This can be used to initialize the result store for a
given attempt ID prior to sending the HTLC out to the
network.
We're seeing what benefit to upstream clients is
provided if the underlying SendHTLC implementatation
(and by extension the SendOnion RPC) will not forward
the same attempt ID twice without the result for a
given ID having been cleaned from the result store.

We accomplish the duplicate prevention using the
InitAttempt method of the Switch Store.
We can now assert that making multiple calls to SendOnion for
the same attempt ID is prevented.
We need to ensure that each of our test nodes
are aware of the necessary channels. Not sure if
AssertChannelActive or AssertChannelInGraph is better
for this purpose.
This allows for the resolving of any ambiguity in the status
of an HTLC in the case where the ChannelRouter and Switch run
in separate processes and communicate over the network.

When SendHTLC returns it is expected that the Router know, without
any uncertainty whether the HTLC is in-flight (no error) or that
it is neither in flight nor will it ever be in flight. This is
tricky to guarantee when the communication happens over a network.

Instead we allow the server to resolve any uncertainty by calling
SendHTLC while resuming a payment prior to attempting to track
the result. If the HTLC was successfully received by the remote
Switch, then the Router will receive a duplicate HTLC error and
can proceed to tracking the attempt result like normal.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Apr 30, 2025

Important

Review skipped

Auto reviews are limited to specific labels.

🏷️ Labels to auto review (1)
  • llm-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@calvinrzachman
Copy link
Contributor Author

Mistakenly opened this. My apologies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant