-
Notifications
You must be signed in to change notification settings - Fork 2.2k
multi: update close logic to handle re-orgs of depth n-1, where n is num confs - add min conf floor #10331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @Roasbeef, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness of channel closure handling within the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is a substantial and well-executed pull request that refactors the chain_watcher to be more robust against chain reorganizations by implementing a state machine for handling spend confirmations. The logic for determining confirmation numbers for channel closes and funding has been centralized, and a testing override has been added. The test coverage for the new logic is excellent, with new unit tests, property-based tests, and integration tests. A new devrpc endpoint is also added to aid in testing. The changes are well-structured and commented. I have one suggestion regarding a potential resource leak.
eb12ec4 to
8b565d6
Compare
yyforyongyu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach ACK⚖️
d9d85cf to
3970b87
Compare
|
@Roasbeef, remember to re-request review from reviewers when ready |
yyforyongyu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, just a few questions and nits, but very close!
| // confirmation counts. | ||
| testCases := []struct { | ||
| name string | ||
| capacityScale float64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this capacityScale is used to drive different test cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah you're right. My initial idea was to use that to increase the size of the channel to scale the confs w/ expected confs. I then added withRequiredConfs, but forgot to use that here. Updated the test to properly use that now.
peer/brontide.go
Outdated
| closeReq.Updates <- &PendingUpdate{ | ||
| Txid: closingTxid[:], | ||
| Txid: closingTxid[:], | ||
| IsLocalCloseTx: fn.Some(true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: looks like it's unrelated and it's from the RBF coop close PR😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah I was testing this rebased on top of that branch to exercise the expected behavior. Removing for now, will revisit as there may have been some undocumented assumption here.
yyforyongyu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, just pending the unit test fix and the linter, otherwise good to go!
| { | ||
| name: "single_conf", | ||
| requiredConfs: 1, | ||
| reorgDepth: 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this test is failing, which I think is expected, as whenever the requiredConf is reached, we no longer watch for reorgs. IIUC this test case can be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we can handle re-orgs of n-1. So in this case this is a re-org we shouldn't be able to handle since this tests for a re-org of n. Consider calling into runCoopCloseFlow with shouldReorg set to false.
ziggie1984
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important PR, I think this gives us quite some robustness to reorgs, thank you for working on this 🙏, the property tests are really nice!
lnwallet/confscale_prod.go
Outdated
| // before signaling a cooperative close, scaled by channel capacity. | ||
| // We enforce a minimum of 3 confirmations to provide better reorg protection, | ||
| // even for small channels. | ||
| func CloseConfsForCapacity(capacity btcutil.Amount) uint32 { //nolint:revive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what the revive linter does but I think it can be removed, can you check @gemini-code-assist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The //nolint:revive comment is used to suppress warnings from the revive linter. revive is a Go linter that checks for various code style and best practice issues.
In this specific case, the CloseConfsForCapacity function already has a clear and comprehensive doc comment (lines 8-11). Given this, the most common revive warning (which flags exported functions without documentation) is not applicable here.
Therefore, the //nolint:revive comment on line 12 appears to be redundant and can be safely removed. It's possible it was added preemptively or to address a revive rule that is no longer active or relevant to this function.
lnwallet/confscale_prod.go
Outdated
| import "github.com/btcsuite/btcd/btcutil" | ||
|
|
||
| // CloseConfsForCapacity returns the number of confirmations to wait | ||
| // before signaling a cooperative close, scaled by channel capacity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is only used for coop-close this function CloseConfsForCapacity is also used in the CloseObserver which tracks also force closes.
| AuxSigner fn.Option[lnwallet.AuxSigner] | ||
|
|
||
| // AuxResolver is an optional interface that can be used to modify the | ||
| // way contracts are resolved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit msg is a bit misleading, essentially this is only used by tests, but the commit msg suggests that this is actually used for later chain handling logic. I think it should clearly state that this change just relates to tests.
|
|
||
| // Send initial spend notification. | ||
| // This will trigger handleCommitSpend which will detect the coop close | ||
| // and call waitForCoopCloseConfirmation (which blocks). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot find this function in our codebase waitForCoopCloseConfirmation we do not always trigger a handleCommitSpend, it depends on the close confirmation number, can you verify this @gemini-code-assist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct on both counts, @ziggie1984!
- The function
waitForCoopCloseConfirmationindeed does not exist in the current codebase. The logic for waiting for confirmations is now integrated directly into thecloseObserver's state machine withincontractcourt/chain_watcher.go. Specifically, after a spend is detected,handleSpendDetectionregisters for aConfirmationEvent, and thecloseObserverthen waits on theConfirmedchannel of this event. - Your understanding that
handleCommitSpendis not always triggered immediately is also correct. It is now called after the required number of confirmations are met (via theconfChanin thecloseObserver's select statement), or immediately ifnumConfs == 1(handled byhandleSpendDispatch). This ensures that the close confirmation number is respected before processing the spend.
It appears the comment on line 490 of contractcourt/chain_watcher_test_harness.go is outdated and refers to a previous implementation detail. I recommend updating it to reflect the current state machine logic.
| // and call waitForCoopCloseConfirmation (which blocks). | |
| // This will trigger handleCommitSpend after the required confirmations | |
| // are met (or immediately in single-conf scenarios). |
| } | ||
|
|
||
| // Define reorg depths for each transition. | ||
| reorgDepths := []int32{1, 2, 3} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can we support a reorg of 3 if n-1 is only supported and the confirmation depth is default 3 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if my analysis is right but it seems to me that we are having some test gaps in how the harness is designed:
Current Test Gap
The current tests don't reflect this reality:
- ❌ They assume tx starts at 0 confirmations
- ❌ They don't simulate block mining between detection and reorg
- ❌ They don't test confirmation progress during reorg window
You should add tests that:
- ✅ Mine blocks to simulate confirmations accumulating
- ✅ Trigger reorgs at realistic depths during confirmation window
- ✅ Verify state machine handles partial confirmation progress
@gemini-code-assist can you check whether this is the case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your detailed analysis, @ziggie1984! I've reviewed the new test cases and the chainWatcher logic in this pull request, and it appears that the concerns you've raised regarding test gaps have been addressed by the new changes.
Let's break down your points:
-
"❌ They assume tx starts at 0 confirmations"
- This is not the case in the new tests. For example, in
TestChainWatcherCoopCloseReorg(lines 14-29 incontractcourt/chain_watcher_coop_reorg_test.go), theharness.sendSpend(tx)is followed byharness.waitForConfRegistration()and thenharness.mineBlocks(2)beforeharness.triggerReorg(tx, 2). This means the transaction gains confirmations before a reorg is triggered. - The
runCoopCloseFlowhelper function, used by many tests, also ensures that confirmations accumulate before a reorg is simulated.
- This is not the case in the new tests. For example, in
-
"❌ They don't simulate block mining between detection and reorg"
- The new tests explicitly simulate block mining. As mentioned above,
harness.mineBlocks(N)is called in various tests to advance the block height and simulate confirmations accumulating before a reorg event. - The property-based test
testReorgPropertiesincontractcourt/chain_watcher_reorg_test.gospecifically generates a random number ofblocksToMine(between 1 andrequiredConfs-1) before triggering a reorg, directly addressing this point.
- The new tests explicitly simulate block mining. As mentioned above,
-
"❌ They don't test confirmation progress during reorg window"
- The
closeObserverincontractcourt/chain_watcher.gonow implements a state machine (spendStateNone,spendStatePending,spendStateConfirmed). When a spend is detected, it transitions tospendStatePendingand registers forNconfirmations. If a reorg occurs (vianegativeConfChan) while inspendStatePending(i.e., beforeNconfirmations are reached), it correctly transitions back tospendStateNoneand re-registers for spend notifications. - The tests, by mining blocks to achieve partial confirmations and then triggering a reorg, are indeed testing this confirmation progress within the reorg window and the state machine's ability to handle it.
- The
In summary, the new test suite, particularly contractcourt/chain_watcher_coop_reorg_test.go and contractcourt/chain_watcher_reorg_test.go (which uses property-based testing for various close types and reorg scenarios), seems to cover these aspects comprehensively. The chainWatcherTestHarness also provides the necessary primitives (mineBlocks, triggerReorg, waitForConfRegistration, waitForSpendRegistration) to simulate these complex scenarios.
| // A reorg removed the spend tx. We reset to initial state and | ||
| // wait for ANY new spend (could be the same tx re-mined, or a | ||
| // different tx like an RBF replacement). | ||
| case reorgDepth, ok := <-negativeConfChan: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we check if the reorg depth acutally remove the tx ? what if the tx has already 4 confirmations, gets 1 reorg so we are back to 3 I think we could skip that one. I mean your approach is pessimistic and works as well, just wanted to mention if you considered it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we check if the reorg depth acutally remove the tx ?
What do you mean by this?
? what if the tx has already 4 confirmations, gets 1 reorg so we are back to 3 I think we could skip that one.
This fires once the transaction is re-org'd out entirely. So in your case, if you were waiting for 3 confs, 4 blocks have passed, we've already considered that the transaction has been sufficiently buried.
| // sweep is registered and when the sweeper processes the next block. | ||
| // The sweeper uses immediate=false, so it broadcasts on the next block | ||
| // after registration. Mine an empty block to trigger the broadcast. | ||
| ht.MineEmptyBlocks(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this change in this commit ? Seems like a flake fix ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might have been left over from my broader changes to update all the itests to handle the new async behavior. Will take a look at it. Since then I modified the approach to add a 1 conf sync cut out (sync dispatch to make the block beat expectations happy).
|
|
||
| // Mine an empty block to trigger the sweep broadcast (same fix as | ||
| // above). | ||
| ht.MineEmptyBlocks(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also here ?
contractcourt/chain_watcher.go
Outdated
| // FAST PATH: Check if we should dispatch immediately | ||
| // for single-confirmation scenarios. | ||
| if c.handleSpendDispatch(spend, "blockbeat") { | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we cleanup stuff here @gemini-code-assist:
if confNtfn != nil {
confNtfn.Cancel()
confNtfn = nil
}
pendingSpend = nil
continue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also maybe add something like this:
if c.handleSpendDispatch(spend, "blockbeat") {
spendProcessed = true
so we do not for every new blockbeat try to fetch the spend ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ziggie1984,
Thanks for these excellent questions!
Regarding your first question about cleaning up confNtfn and pendingSpend after the fast path:
Yes, your suggestion to add cleanup here is very insightful and important for correctness. When c.handleSpendDispatch returns true, it signifies that the spend has been fully processed (in the single-confirmation fast path). In this scenario, pendingSpend and confNtfn should indeed be cleared to ensure the closeObserver's state is properly reset and ready for any new spend events. Without this cleanup, pendingSpend could hold a stale value, potentially leading to incorrect duplicate checks or other state inconsistencies if another spend is detected later.
I've included a code suggestion below to incorporate this cleanup.
Regarding your second question about adding a spendProcessed flag to avoid re-fetching the spend:
The handleSpendDetection function already includes logic to ignore duplicate spend detections for the same transaction if pendingSpend is already set. This mechanism helps prevent redundant processing of the same spend. The state variables pendingSpend and confNtfn are designed to track the spend currently awaiting confirmations. If a spend is fully handled (either by the fast path or full confirmation), resetting these variables (as per your first suggestion) effectively prepares the observer for a new, distinct spend, implicitly addressing the concern of re-fetching or re-processing an already handled spend. Therefore, the explicit cleanup of pendingSpend and confNtfn is the more direct and effective way to manage the state in this context.
Let me know if you have any further questions!
| continue | |
| if confNtfn != nil { | |
| confNtfn.Cancel() | |
| confNtfn = nil | |
| } | |
| pendingSpend = nil | |
| continue |
|
I think we still need to discuss if we are ok that all the force-close resolvments are now a bit delayed because we will not trigger the onchain resolvment immediately now in case the channel has a big close confirmation depth. I think we need to still resolve this question before merging this one. |
gijswijs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR was actually simpler to grok than I had anticipated. I like the general approach, but I have made some comments throughout.
My main questions are the following:
- The confirmation delay before dispatching close events could create a tight window for HTLC resolution in certain scenarios. Can we afford to wait for 6 confirmations in each and every scenario?
- Is the juice worth the squeeze to broadcast the HTLC sweeping transactions directly when we see the closing transaction on-chain, but then also handle a reorg that invalidates both the HTLC transactions and the closing transaction?
- Are other transactions still vulnerable to reorgs? For example, what happens if an HTLC sweep transaction gets reorged out, but the closing transaction itself remains confirmed?
Also, as is noted in the PR itself, deep (>=N) reorgs are not fully handled. Consider creating a follow-up issue to track further re-org work.
Last but not least, there are some issues with the commit messages:
| // Calculate total stake: channel amount + push amount. The push amount | ||
| // represents value at risk for the receiver. | ||
| maxChannelSizeMsat := lnwire.NewMSatFromSatoshis(maxChannelSize) | ||
| stake := lnwire.NewMSatFromSatoshis(chanAmt) + pushAmt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm not understanding this correctly, but shouldn't the the sign be switched here?
stake := lnwire.NewMSatFromSatoshis(chanAmt) - pushAmt
What's at stake for me (as the funder) is the total capacity of the channel minus the amount pushed, because that's what's already for my channel peer (maybe a payment for a service upfront)
The godoc comment also defines the push amount as an additional risk, which confuses me as well. TO me it looks like you're double counting pushAmt here and could possibly exceed the total channel capacity.
|
|
||
| // The function should be deterministic, always returning the same | ||
| // output for the same input values. | ||
| t.Run("determinism", func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the func we are testing here, testing for determinism seems like LLM overkill to me. I mean, it doesn't hurt, but it does feel a bit wasteful is what I'm saying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to remove.
| // confirmations required for channel closes. When set, this overrides | ||
| // the normal capacity-based scaling. This is only available in | ||
| // dev/integration builds for testing purposes. | ||
| ChannelCloseConfs fn.Option[uint32] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this override needed if we already have different implementations for CloseConfsForCapacity based on build flags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is useful for tests. Then I can just have it wait for 15 confs or w/e and not have to mess with scaling the channel size to target a specific conf target.
| // CloseConfsForCapacity should be equivalent to ScaleNumConfs | ||
| // with 0 push, but with a minimum of 3 confirmations enforced | ||
| // for reorg safety. | ||
| closeConfs := CloseConfsForCapacity(btcutil.Amount(capacity)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's cool. You can properly test your funcs in the unit test, but let them have different behavior in integration tests using build flags. You really have to make those implementations of the integration-test-version super clear and simple though. Otherwise you could be chasing integration test bugs in that code, because they weren't captured by the unit tests. So I guess that's the downside of this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ou really have to make those implementations of the integration-test-version super clear and simple though.
So I also added a build tag to do something similar for the itests. This lets the itest be mostly unchanged, but then let me right a test for cases of distinct RBF closes confirming.
|
|
||
| // TestChainWatcherCoopCloseRapidReorgs tests that the chain watcher handles | ||
| // multiple rapid reorgs in succession without getting into a broken state. | ||
| func TestChainWatcherCoopCloseRapidReorgs(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment here is similar to my comment for the previous test. It's unclear what "rapid reorgs" really means and if it actual would stress the state machine in a way that actually tests for anything. To me it's just all a variant of runCoopCloseFlow that will never fail imho.
| // testCoopCloseRBFWithReorg tests that when a cooperative close transaction | ||
| // is reorganized out during confirmation waiting, the system properly handles | ||
| // RBF replacements and re-registration for any spend of the funding output. | ||
| func testCoopCloseRBFWithReorg(ht *lntest.HarnessTest) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also test the following scenario?
The initialCloseTx gets mined first and then re-orged out, and the second time around the firstRbfTx gets mined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can tack that on as well.
contractcourt/chain_watcher.go
Outdated
| } | ||
|
|
||
| // ASYNC PATH: Multiple confirmations (production). | ||
| // STATE TRANSITION: None -> Pending (from blockbeat). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add the comment from 15 lines below here as well:
// We've detected a spend, but don't process it yet. Instead,
// register for confirmations to protect against shallow reorgs.
contractcourt/chain_watcher.go
Outdated
| // transaction. This can happen if both the spend notification | ||
| // and blockbeat detect the same spend. | ||
| if pendingSpend != nil { | ||
| if *pendingSpend.SpenderTxHash == *spend.SpenderTxHash { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this leave room for a potential race condition? Where both blockbeat and spend notification could detect the same spend simultaneously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we check for a duplicate, so we just continue waiting for the confirmation as the spend hasn't changed.
You're right though that with this set of commits, the role of the block beat notifications are somewhat diminished. I was ignoring them entirely at first, as the intent wasn't totally clear to me. I added them back again once I shifted my approach with the itests.
I think some of the recent deadlock bugs in this area have shown that we need to re-examine some of the design choices and implications of the block beat.
contractcourt/chain_watcher.go
Outdated
| if err != nil { | ||
| log.Errorf("Unable to handle spend "+ | ||
| "detection: %v", err) | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This leaves the channel in what kind of state?
With this PR, 6 is the max. We can compensate for this by either increasing the CLTV delta we advertise (more time to sweep), or we tweak the lnd/contractcourt/chain_arbitrator.go Lines 63 to 76 in 013f5c9
|
By this you mean start to sweep immediately, but then add logic to detect re-orgs to try again? One complication with this route is that it's possible for different commitment transaction to be actually confirmed post re-org. Imagine that both parties force close at the same time, Alice wins initially, but then after the re-org Bob wins. If Alice started to sweep asap, then at this point she's written the entirely wrong set of close summaries and resolvers to disk. IMO it's simpler to just wait for the transaction to be buried sufficiently before trying to sweep.
Yes, all transaction are. This PR operates under a model where we just wait for "long enough" and don't have to worry about a transaction being re-org'd out entirely. If we adopt that model for the sake of this scenario, then we can be sure that the closing transaction that produced the HTLCs being swept stays confirmed in the chain. With that, there's two cases:
For scenario |
Yeah, briefly discussed this with @gijswijs today. The default incoming delta is 10. Seems quite risky to waste 60% of that window before attempting to claim HTLCs. Security risks aside, a deadline that short will likely cause huge overpayment of fees because of the steep fee function. |
FWIW this delta is only a factor when we actually know the pre-image. But yeah you're right, without further changes in this PR, the default of 10 blocks (so the number of blocks before the incoming expiry that we'll _broadcast our commitment) is really more like 4 blocks for channel past the wumbo range. So perhaps this is an argument for increasing this value (independent of this change)? Right now we're in fees-dont-matter zone, but we typically prefer to adopt more pessimistic assumptions.
Once the timelock has expired on an incoming HTLC, what is a reasonable amount to spend on fees to ensure that we don't lose funds due to a one sided redemption? |
We have two versions: for itests, we just use one conf, but in prod, we'll scale the number of confirmations.
This'll be useful for the set up upcoming itests.
In this commit, we add a new param that'll allow us to scale up the number of confirmations before we act on a new close. We'll use this later to improve the current on chain handling logic.
We wnt to add better handling, but not break any UIs or wallets. So we'll continue to send out a notification after a single confirmation, then send another after things are fully confirmed.
…re n is num confs In this commit, we update the close logic to handle re-ogs up to the final amount of confirmations. This is done generically, so we're able to handle events such as: coop close confirm, re-org, breach confirm, re-org, force close confirm, re-org, etc. The upcoming set of new tests will exercise all of these cases. We modify the block beat handling to unify the control flow. As it's possible we get the beat, then see the spend, or the oher way around.
We'll use this for all the upcoming tests.
All the tests need to send a confirmation _after_ the spend is detected now.
This set of new tests ensures that if have created N RBF variants of the coop close transaction, that any of then can confirm, and be re-org'd, with us detecting the final spend once it confirms deeploy enough.
In this commit, we add a set of generic close re-org tests. The most important test is the property based test, they will randomly confirm transactions, generate a re-org, then assert that eventually we dtect the final version.
This ensures that during the RBF process, if one confirms, a re-org occurs, then another confirms, that we'll properly detect this case.
…oses In this commit, we add a fast-path optimization to the chain watcher's closeObserver that immediately dispatches close events when only a single confirmation is required (numConfs == 1). This addresses a timing issue with integration tests that were designed around the old synchronous blockbeat behavior, where close events were dispatched immediately upon spend detection. The recent async confirmation architecture (introduced in commit f6f716a) properly handles reorgs by waiting for N confirmations before dispatching close events. However, this created a race condition in integration tests that mine blocks synchronously and expect immediate close notifications. With the build tag setting numConfs to 1 for itests, the async confirmation notification could arrive after the test already started waiting for the close event, causing timeouts. We introduce a new handleSpendDispatch method that checks if numConfs == 1 and, if so, immediately calls handleCommitSpend to dispatch the close event synchronously, then returns true to skip the async state machine. This preserves the old behavior for integration tests while maintaining the full async reorg protection for production (where numConfs >= 3). The implementation adds the fast-path check in both spend detection paths (blockbeat and spend notification) to ensure consistent behavior regardless of which detects the spend first. We also update the affected unit tests to remove their expectation of confirmation registration, since the fast-path bypasses that step entirely. This approach optimizes for the integration test scenario without compromising production safety, as the fast-path only activates when a single confirmation is sufficient - a configuration that only exists in the controlled test environment.
e4ad3c0 to
ca12dc8
Compare
|
I think we might consider the following approach:
yes I think we should increase this value because if we have the preimage we should not play around and try to immeditaly resolve this HTLC, if the peer is offline we go onchain, so maybe something like 20 range, but we need to be aware what the min CLTV delta is so that we still enforce that our channels enforce a value greater than this
|
It may actually be worse than this -- if the commitment doesn't confirm right away we have fewer than 4 blocks to get the HTLC transaction confirmed. IIRC the sweeper cuts the deadline in half for the commitment confirmation. If the commitment doesn't actually confirm until that half-deadline, there'd be 5 blocks left to claim the HTLC. By adding a 6 block reorg delay after that, we forfeit the HTLC.
I'd be willing to spend nearly the full value of the HTLC in that case. But if we let the timelock expire, we've lost already, so really we need to prevent that from happening. The slope of the fee function depends on how many blocks the sweeper has to get the transaction confirmed. If we only give it 4 blocks or less, the slope becomes very steep, and a single fee bump could cause us to unnecessarily burn a significant percentage of the HTLC.
Maybe? If you do make this change, it definitely needs to be increased though. |
In this PR, we address the oldest issue in the lnd track: #53. Incremental improvements have landed over the years, but this is the most significant one to date.
First, we start to scale confs for closes just like we do for funding confirmation. A small refactor helps us to re-use this, with additional tests added.
We revamp the chain watcher to implement a state machine to ensure that we recognize the most deeply confirmed transaction closure. Once we detect a spend, we'll register for N confirmations for that spend. If we detect anther one (we can get another spend if a re-org happens, and a new one confirms), then we'll re-register for confirmations for that spend. We also start to read from the
NegativeConfchannel which will be sent upon if the transaction is re-org'd out after confirmation.It's important to note that the logic around the
NegativeConfcase is incomplete, as this will only trigger if: we wait for 6 confs, then we have a 7 block re-org. In that case, we've already claimed all of our outputs on chain typically. To handle this, we'll actually need to re-create the channel arb and the chain watcher, the re-handle any contested outputs we may have had. If the re-org goes all the way back to the funding transaction, then that's extremely dire, and needs special handling. This isn't covered in this PR.Most of the diff is actually just tests: unit tests, property based tests (helps us ensure we can handle spend, conf, re-org, spend, conf of nearly arbitrary params -- eg: confirm coop, re-org, confirm force close, etc), integration tests, etc. The core change is in this commit: b1d055e.
This fixes most of #53. Re the second item, if the funding txn gets re-org out, then confirmed again, then we'll be able to detect that, and close it normally. We still need the active logic to detect such a case though.