Skip to content

chancloser: don't set ChanStatusCoopBroadcasted before close tx exists#10782

Open
jtobin wants to merge 4 commits into
lightningnetwork:masterfrom
jtobin:tapd-issue2108
Open

chancloser: don't set ChanStatusCoopBroadcasted before close tx exists#10782
jtobin wants to merge 4 commits into
lightningnetwork:masterfrom
jtobin:tapd-issue2108

Conversation

@jtobin
Copy link
Copy Markdown
Collaborator

@jtobin jtobin commented Apr 30, 2026

Fixes lightninglabs/taproot-assets#2108.

Lightly-edited summary, ctsy Opus, of the issue and its fix:

During close negotiation, BeginNegotiation called MarkCoopBroadcasted(nil) to mark the channel as "coop close in progress" for RPC consistency — but this set ChanStatusCoopBroadcasted without storing any close transaction. If the peer disconnected before the close tx was built, the channel was stranded.

The MarkCoopBroadcasted(nil) call was redundant. ShutdownInfo — already persisted earlier by MarkShutdownSent — serves as the durable signal that shutdown was entered. On reconnect, loadActiveChannels already detects ShutdownInfo on a ChanStatusDefault channel, starts the link normally, and restarts the close negotiation. The fix simply removes the premature status marking, letting the existing ShutdownInfo mechanism handle reconnect correctly.

The same pattern existed in the RBF close path and is removed there too.

Of note: the OP's logs in the linked issue indicate that he landed in the ChanStatusCoopBroadcasted state without any close transaction, which, I believe, could only have occurred if MarkCoopBroadcasted was called with nil. There appear to be no other code paths that set ChanStatusCoopBroadcasted without storing a close tx.

I was pretty curious why those MarkCoopBroadcasted(nil) calls existed, but after investigating them thoroughly, I couldn't really find any good justification for them. Opus's summary of this analysis is also fruitful to include here:

I've gone through every purpose the calls could serve:

RPC visibility: The original comment claimed it was for listchannels consistency. But ListChannels doesn't check ChanStatusCoopBroadcasted — it reports all open channels with their link status. The claim was wrong.

Reconnect restart: loadActiveChannels uses ChanStatusCoopBroadcasted at line 1278 to trigger restartCoopClose. But lines 1367-1487 already do the same thing using ShutdownInfo for ChanStatusDefault channels. Redundant.

HTLC routing prevention: Without the status, the link starts normally. But the link already knows about the shutdown — initChanShutdown disables the channel and the link refuses new HTLCs once Shutdown is sent. Already handled.

BOLT#2 Shutdown retransmission: Already handled by the ShutdownInfo path at lines 1376-1431 (legacy) and 1467-1487 (RBF).

Close tx rebroadcast (chain_arbitrator.go:842): With the nil call, BroadcastedCooperative() returns ErrNoCloseTx — there's nothing to rebroadcast. The nil call just causes a no-op error path.

Crash safety: MarkShutdownSent is called before BeginNegotiation (in initChanShutdown). If the node crashes between the two, only ShutdownInfo is present — which is the state our fix relies on. Removing the nil call actually makes crash recovery more consistent, not less.

PendingChannels RPC: This is the one behavioral change. Previously, a channel moved from ListChannels to PendingChannels.WaitingCloseChannels as soon as shutdown started (even with no close tx). Now it stays in ListChannels (as inactive) until the close tx is actually broadcast. This is more accurate — a channel with no close tx isn't "waiting for a close to confirm."

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue where channels could become stranded in a limbo state during cooperative close negotiation if a peer disconnected before the closing transaction was finalized. By removing the premature setting of the ChanStatusCoopBroadcasted flag, the system now relies on the existing ShutdownInfo mechanism to track shutdown progress. This change ensures that channels remain in an active state upon reconnection, allowing the negotiation to resume seamlessly and preventing errors related to missing closed channel summaries.

Highlights

  • Fix for Channel Limbo State: Removed the premature MarkCoopBroadcasted(nil) call during cooperative close negotiation, which previously set ChanStatusCoopBroadcasted without a corresponding close transaction, causing channels to become stranded upon peer disconnection.
  • Improved Crash Recovery: Leverages existing ShutdownInfo (persisted via MarkShutdownSent) as the durable signal for shutdown, ensuring that reconnects correctly resume negotiation without requiring the problematic status flag.
  • Regression Testing: Added a new integration test testCoopCloseResumeAfterRestart and unit tests in peer/brontide_test.go to verify that channels remain active and correctly resume negotiation after restarts.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@github-actions github-actions Bot added the severity-critical Requires expert review - security/consensus critical label Apr 30, 2026
@github-actions
Copy link
Copy Markdown

🔴 PR Severity: CRITICAL

Automated classification | 3 files (non-excluded) | 31 lines changed (non-excluded)

🔴 Critical (2 files)
  • lnwallet/chancloser/chancloser.go - wallet cooperative close logic (lnwallet/*)
  • lnwallet/chancloser/rbf_coop_transitions.go - RBF cooperative close state transitions (lnwallet/*)
🟢 Low (1 file + excluded)
  • docs/release-notes/release-notes-0.22.0.md - release notes (docs/*)
  • itest/list_on_test.go - integration test list (excluded from severity count)
  • itest/lnd_coop_close_reconnect_test.go - integration test (excluded from severity count)
  • lnwallet/chancloser/rbf_coop_test.go - unit test (excluded from severity count)
  • peer/brontide_test.go - unit test (excluded from severity count)

Analysis

This PR modifies lnwallet/chancloser/chancloser.go and lnwallet/chancloser/rbf_coop_transitions.go, which are part of the lnwallet/* package — a CRITICAL package covering wallet operations, channel funding, signing, and commitment transactions. The changes involve cooperative channel close logic and RBF (Replace-by-Fee) cooperative close state transitions.

Severity bump check: 3 non-excluded files (< 20 threshold), 31 non-excluded lines changed (< 500 threshold), and only one distinct critical package touched — no bump applied.

This PR warrants expert review due to the wallet/channel-close logic changes, even though the diff is small. The cooperative close path and RBF transitions directly affect fund safety.


To override, add a severity-override-{critical,high,medium,low} label.
<!-- pr-severity-bot -->

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug where ChanStatusCoopBroadcasted was set before a cooperative close transaction actually existed, leading to issues during node restarts. The changes remove the premature marking of the channel as broadcasted in chancloser.go and associated RBF transition logic, ensuring the channel remains in a default state until a transaction is ready. New integration and unit tests were added to verify that channels with ShutdownInfo now correctly resume negotiation after a restart. One review comment regarding an unnecessary variable declaration was identified.

Comment thread lnwallet/chancloser/chancloser.go
@jtobin jtobin requested review from Roasbeef and gijswijs May 1, 2026 08:46
@ziggie1984 ziggie1984 requested review from ziggie1984 and removed request for Roasbeef May 4, 2026 20:10
Copy link
Copy Markdown
Collaborator

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice writeup — the analysis of why MarkCoopBroadcasted(nil, …) is redundant matches what I see tracing readers of ChanStatusCoopBroadcasted through loadActiveChannels, restartCoopClose,
and the chain arbitrator. The fix is correct: ShutdownInfo (persisted by MarkShutdownSent in initChanShutdown) is the right durable signal, and the existing peer/brontide.go:1367-1487
reconnect path already consumes it. A few things before this lands:

The new tests don't actually defend the fix. I verified empirically by cherry-picking the itest commit onto the parent of the fix.

The held HODL HTLC keeps the channel from flushing, so neither BeginNegotiation nor the RBF ChannelFlushing → ClosingNegotiation transition runs before the suspend — the call sites this
PR removes are simply unreached. Same issue with the two new unit tests: TestShutdownInfoChannelStaysActive is tautological (MarkShutdownSent only writes to shutdownInfoKey, can't
mutate chanStatus or the in-memory map by construction), and TestResendChanSyncFailsForCoopBroadcastedLimbo manually creates the limbo via MarkCoopBroadcasted(nil, …) rather than
relying on production code to produce it — so it tests the symptom, not that the fix prevents the state.

Real regression coverage exists on the RBF side, not the legacy side. Removing expectChanPendingClose() from TestRbfChannelFlushingTransitions implicitly turns into a "no unexpected
MarkCoopBroadcasted call" assertion via testify/mock's Called() + AssertExpectations. ✓ But the legacy BeginNegotiation path is unguarded: mockChannel.MarkCoopBroadcasted
(chancloser_test.go:164) is a hand-rolled stub with no call tracking. A regression that re-adds the nil call there would slip through.

Suggestion: drop the itest and the two new unit tests, and add call tracking to the legacy mockChannel — either a markCoopBroadcastedCalls slice with a require.NotContains(t, …, nilTx)
assertion in the existing BeginNegotiation test, or convert it to embed mock.Mock like the RBF side. Few lines, milliseconds to run, actually defends both paths.

Two smaller items:

  • ChannelFlushed.FreshFlush is now dead — only producer (peer/brontide.go:4075) sets it unconditionally to true, and the test loops at rbf_coop_test.go:1785, 1822 no longer
    differentiate between the two values. Worth removing in this PR or a follow-up.
  • The PendingChannels/ListChannels visibility change is acknowledged in the PR description but missing from the release note. Operators with monitoring keyed on those buckets will see
    channels move differently — channels stay in ListChannels (inactive) until the real close tx is broadcast, and WaitingCloseChannel.ClosingTx is no longer ever empty. Worth a sentence.

Comment thread lnwallet/chancloser/chancloser.go
Comment thread peer/brontide_test.go Outdated
Comment thread peer/brontide_test.go Outdated
@ziggie1984
Copy link
Copy Markdown
Collaborator

RPC visibility: The original comment claimed it was for listchannels consistency. But ListChannels doesn't check ChanStatusCoopBroadcasted — it reports all open channels with their link status. The claim was wrong.

Hmm not sure but I think before this change the channel would immediatly move to the pendingClose channel and now with the new change it would only move to the pending close if the close-transaction was actually created which I think is still the right behavior now. A channel can still remain in the listchannels output if the closeTx is not created yet.

@jtobin jtobin force-pushed the tapd-issue2108 branch from 943278a to 5150469 Compare May 6, 2026 14:12
@jtobin jtobin requested a review from ziggie1984 May 6, 2026 17:07
@jtobin
Copy link
Copy Markdown
Collaborator Author

jtobin commented May 7, 2026

(CI errors from the last run look to be unrelated flakes, from what I can tell.)

@ziggie1984
Copy link
Copy Markdown
Collaborator

what about ?

ChannelFlushed.FreshFlush is now dead — only producer (peer/brontide.go:4075) sets it unconditionally to true, and the test loops at rbf_coop_test.go:1785, 1822 no longer
differentiate between the two values. Worth removing in this PR or a follow-up.

@jtobin
Copy link
Copy Markdown
Collaborator Author

jtobin commented May 7, 2026

what about ?

ChannelFlushed.FreshFlush is now dead — only producer (peer/brontide.go:4075) sets it unconditionally to true, and the test loops at rbf_coop_test.go:1785, 1822 no longer
differentiate between the two values. Worth removing in this PR or a follow-up.

Was removed in 5150469. 👍 👍

Copy link
Copy Markdown
Collaborator

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Collaborator

@gijswijs gijswijs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things:

The commit message of abf09c6 starts with a paragraph that is unclear:

Replace the peer-level unit tests and the integration test (dropped in previous rebase) with call tracking on the legacy mockChannel.MarkCoopBroadcasted stub. The old tests either never reached the removed call sites or were tautological.

In Dutch we would say: I can't make chocolate from that.

  1. I can't make inline comments for code that's untouched, so I'll put it here:
    peer/brontide.go:3629-3635

The leading comment in restartCoopClose reads:

If this channel has status ChanStatusCoopBroadcasted and does not
have a closing transaction, then the cooperative close process was
started but never finished. We'll re-create the chanCloser state
machine and resend Shutdown.

Post-PR, fresh installations cannot reach the "status set + no close tx" state — that was the limbo state being fixed. The branch that handles that case is now exclusively a backwards-compat recovery path for nodes upgrading with pre-existing limbo channels (plus the always-RBF restart path). Without a comment update, the next engineer to touch this code will likely conclude the branch is unreachable and either delete it (breaking upgrade recovery) or be confused by it.

Suggested rewrite:

// If this channel has status ChanStatusCoopBroadcasted and a closing
// transaction was recorded, we just need to rebroadcast (handled by
// the chain arbitrator) and exit. If the status is set but no closing
// tx exists, the channel was left in the pre-PR-#10782 limbo state:
// fall through and re-drive the close negotiation via ShutdownInfo
// or LocalUpfrontShutdownScript.
  1. The interface should explicitly say the tx must be non-nil:

lnwallet/chancloser/interface.go

// MarkCoopBroadcasted persistently marks that the channel close
// transaction has been broadcast. The transaction MUST be non-nil;
// callers must not invoke this until a concrete close tx has been
// constructed. Passing a nil tx historically left channels stranded
// in ChanStatusCoopBroadcasted with no recorded close tx (PR #10782).
MarkCoopBroadcasted(*wire.MsgTx, lntypes.ChannelParty) error
  1. channeldb.markBroadcasted accepts nil silently channeldb/channel.go:2195-2202, markBroadcasted at 2208-2238

The lowest-level helper accepts closeTx == nil and quietly skips the tx-bucket write (line 2217: if closeTx != nil) while still setting the status flag. We could make this an error rather than a silent skip.

  1. Worth exploring is the following: Incoming-HTLC acceptance window on coop-close reconnect (Sorry for the giant copy-pasta)

Locations:

  • Load gate: peer/brontide.go:1253-1300 (non-default-status branch skips addLink)
  • Default-status fall-through to addLink: peer/brontide.go:1367-1487
  • Outgoing-only disable on resume: htlcswitch/link.go:3944-3957PreviouslySentShutdown.WhenSome calls only DisableAdds(Outgoing) at :3946, then markReestablished() at :3961.
  • Incoming gate: htlcswitch/link.go:4017processRemoteUpdateAddHTLC rejects only when IsFlushing(Incoming).
  • Where Incoming actually gets disabled: only on receipt of the peer's Shutdownpeer/brontide.go:5306 (legacy) and lnwallet/chancloser/rbf_coop_transitions.go:330, :444 via ChanObserver.DisableIncomingAddspeer/chan_observer.go:90-104link.DisableAdds(Incoming).

Mechanism. Pre-PR (legacy path), once BeginNegotiation ran, the channel had ChanStatusCoopBroadcasted set. On reconnect, loadActiveChannels took the non-default-status branch at brontide.go:1253, called restartCoopClose, and continued at :1299the link was never registered, so no update_add_htlc could reach it. Post-PR, the channel stays ChanStatusDefault throughout shutdown negotiation, so loadActiveChannels falls through to addLink at :1444. resumeLink synchronously calls DisableAdds(Outgoing) before markReestablished(), but never calls DisableAdds(Incoming) — that only runs when we process the peer's resent Shutdown.

Pre-existing for RBF. The PR-removed call in RBF (ChannelFlushingMarkCoopBroadcasted(nil, true)) was gated on FreshFlush. For RBF closes of already-clean channels (FreshFlush=false), the status was never set early, so this window already existed pre-PR. This PR extends the same window to legacy and to the FreshFlush=true RBF case.

Impact. A spec-violating peer can push an update_add_htlc between our reconnect-time link resume and our receipt of their resent Shutdown. We accept it. No fund loss — the HTLC either succeeds (we get paid) or is cancelled back. The plausible griefing angle is delaying coop close completion: legacy closing_signed exchange requires a clean channel, so an HTLC the peer pushes during the window must resolve first; combined with repeated disconnect/reconnect or a stuck HTLC this could push the close toward force-close. A spec-following peer never sends update_add_htlc after sending Shutdown, so practical attack surface is small.

PoC (security-auditor original, with one correction: update_add_htlc and commit_sig are separate messages, not nested):

Alice + malicious Bob, legacy coop close.
1. Bob → Alice: Shutdown. Alice replies with Shutdown; MarkShutdownSent.
2. Bob disconnects before sending ClosingSigned.
3. Alice loads channel as ChanStatusDefault (post-PR), registers the link.
4. Alice's resumeLink resends Shutdown via PreviouslySentShutdown and
   calls DisableAdds(Outgoing). Incoming is NOT disabled. Link is then
   markReestablished().
5. Bob reconnects and sends update_add_htlc *before* his own Shutdown.
   Alice's processRemoteUpdateAddHTLC sees IsFlushing(Incoming)==false
   and accepts the HTLC.
6. Bob then sends Shutdown; DisableIncomingAdds fires — too late.

Remediation options:

  • In loadActiveChannels (post-addLink) when ShutdownInfo is present, also call link.DisableAdds(Incoming) — mirror the outgoing disable already in resumeLink. Concretely: extend PreviouslySentShutdown.WhenSome at link.go:3944 to disable both directions, or invoke it from the caller after addLink returns.
  • Alternatively, disconnect on receiving update_add_htlc while local ShutdownInfo is set, regardless of whether the peer's Shutdown has arrived.
  • Add a regression itest: closechannel → disconnect → reconnect → malicious peer sends UpdateAddHTLC before its own Shutdown → assert HTLC rejected.

Not sure if it's worth the effort tho.

Apart from that only small nits inline.

Comment thread lnwallet/chancloser/rbf_coop_test.go Outdated
Comment thread lnwallet/chancloser/chancloser_test.go
Comment thread lnwallet/chancloser/rbf_coop_test.go Outdated
Comment thread lnwallet/chancloser/rbf_coop_test.go Outdated
@jtobin
Copy link
Copy Markdown
Collaborator Author

jtobin commented May 11, 2026

@gijswijs I'm investigating the potential griefing attack now; looks very minor, but if the mitigation is simple and its correctness/righteousness is easy to ascertain, I can address it here.

(EDIT: yeah, looks like an easy fix. Need to patch up a couple of comments too; will do all that tomorrow.)

@jtobin
Copy link
Copy Markdown
Collaborator Author

jtobin commented May 13, 2026

@gijswijs Re: the griefing attack: it's obviously very minor (spec-violating peer can delay coop close; force close remains available), and it seems the "easy" mitigation could cause more severe problems. E.g. if we just disable incoming HTLC adds after persisting Shutdown, and then crash before sending the Shutdown message, then we could reject spec-observant HTLC adds upon restart, since the peer didn't get the message.

Given the problem is minor, and the easy solution to it is arguably worse, I'm inclined to ignore it here.

Rest of your feedback has been addressed!

@jtobin jtobin requested a review from gijswijs May 13, 2026 19:24
jtobin added 4 commits May 14, 2026 10:20
Remove the two call sites that set ChanStatusCoopBroadcasted
before a cooperative close transaction exists:

 - BeginNegotiation in the legacy close path (chancloser.go)
 - ChannelFlushed handling in the RBF close path
   (rbf_coop_transitions.go)

Both calls passed nil as the close tx, creating a "limbo" state
where ChanStatusCoopBroadcasted is set but no close transaction
is stored. This is unnecessary because ShutdownInfo — persisted
earlier by MarkShutdownSent in initChanShutdown / the RBF
ShutdownPending transition — already serves as the durable
signal that the shutdown flow was entered.

ChanStatusCoopBroadcasted should only be set when a real close
transaction exists, which this change preserves.
Add call tracking to the legacy mockChannel so that every
MarkCoopBroadcasted invocation is recorded. TestTaprootFastClose
now asserts that at least one call was made and that every call
carried a non-nil tx, guarding against the limbo state described
in lightninglabs/taproot-assets#2108.
FreshFlush is never read in any transition handler. The only
producer (peer/brontide.go) sets it unconditionally to true,
and after the previous commit removed expectChanPendingClose,
the test loops that iterated over {true, false} no longer
differentiate between the two values.

Remove the field, the unconditional assignment, and collapse
the test loops into single sub-tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

severity-critical Requires expert review - security/consensus critical

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug]: Closed channel status ChanStatusCoopBroadcasted|ChanStatusRemoteCloseInitiator without closing_txid

3 participants