Skip to content

Attempt to unblock blocked monitor updates on startup#4520

Open
TheBlueMatt wants to merge 2 commits intolightningdevkit:mainfrom
TheBlueMatt:2026-03-startup-mpp-unblock
Open

Attempt to unblock blocked monitor updates on startup#4520
TheBlueMatt wants to merge 2 commits intolightningdevkit:mainfrom
TheBlueMatt:2026-03-startup-mpp-unblock

Conversation

@TheBlueMatt
Copy link
Copy Markdown
Collaborator

When we make an MPP claim we push RAA blockers for each chanel to ensure we don't allow any single channel to make too much progress until all channels have the preimage durably on disk. We don't have to store those RAA blockers on disk in the ChannelManager as there's no point - if the ChannelManager gets to disk with the RAA blockers it also brought with it the pending ChannelMonitorUpdates that contain the preimages and will now be replayed, ensuring the preimage makes it to all ChannelMonitors.

However, just because those RAA blockers dissapear on reload doesn't mean the implications of them does too - if a later ChannelMonitorUpdate was blocked in the channel we don't have logic to unblock it on startup.

Here we add such logic, simply attempting to unblock all blocked ChannelMonitorUpdates that existed on startup.

Code written by Claude.

Fixes #4518

Needs a test, I have a start on one but need to clean it up

@ldk-reviews-bot
Copy link
Copy Markdown

ldk-reviews-bot commented Mar 30, 2026

I've assigned @joostjager as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@joostjager
Copy link
Copy Markdown
Contributor

There was a unit test already in issue linked in the description. On branch main...joostjager:2026-03-startup-mpp-unblock-with-test, it's added as a commit, and shows that your fix makes the test pass.

@joostjager
Copy link
Copy Markdown
Contributor

Discussed offline and improved unit test by ignoring some ignored read outs and 'loops until settled'.

When we make an MPP claim we push RAA blockers for each chanel to
ensure we don't allow any single channel to make too much progress
until all channels have the preimage durably on disk. We don't have
to store those RAA blockers on disk in the ChannelManager as
there's no point - if the ChannelManager gets to disk with the RAA
blockers it also brought with it the pending ChannelMonitorUpdates
that contain the preimages and will now be replayed, ensuring the
preimage makes it to all ChannelMonitors.

However, just because those RAA blockers dissapear on reload
doesn't mean the implications of them does too - if a later
ChannelMonitorUpdate was blocked in the channel we don't have logic
to unblock it on startup.

Here we add such logic, simply attempting to unblock all blocked
`ChannelMonitorUpdate`s that existed on startup.

Code written by Claude.

Fixes lightningdevkit#4518
@TheBlueMatt TheBlueMatt force-pushed the 2026-03-startup-mpp-unblock branch from a38acca to 6977e25 Compare May 6, 2026 20:38
@TheBlueMatt
Copy link
Copy Markdown
Collaborator Author

Thanks! Rebased and included the test.

@TheBlueMatt TheBlueMatt marked this pull request as ready for review May 6, 2026 20:38
@ldk-reviews-bot ldk-reviews-bot requested a review from joostjager May 6, 2026 20:39
@ldk-claude-review-bot
Copy link
Copy Markdown
Collaborator

ldk-claude-review-bot commented May 6, 2026

No issues found.

The implementation is correct and complete. I performed an exhaustive review of:

  1. New BackgroundEvent::AttemptUnblockMonitorUpdates variant — all pattern matches are exhaustive across the codebase.

  2. Deserialization logic (line 19471-19477) — correctly queues the event only for channels with blocked monitor updates, after the staleness check passes, and in the right position within pending_background_events (after MonitorUpdateRegeneratedOnStartup/MonitorUpdatesComplete, before close events).

  3. Event processing (line 8803-8808)handle_monitor_update_release(counterparty_node_id, channel_id, None) safely gates on raa_monitor_updates_held(), which checks both actions_blocking_raa_monitor_updates (empty on startup) and pending_events for ReleaseRAAChannelMonitorUpdate completion actions. Lock ordering is correct (only total_consistency_lock read is held by process_background_events, and handle_monitor_update_release acquires per_peer_state/peer state inside).

  4. Assertion at line 9765AttemptUnblockMonitorUpdates { .. } => false is correct; this test-only assertion checks for a preimage replay or monitor completion event for the claiming channel, which AttemptUnblockMonitorUpdates is not.

  5. Serialization — background events are always written as count 0 (line 18223), so the new variant is never persisted, matching the "never written to disk" comment.

  6. Test — comprehensive, covering the full reload-and-unblock cycle for a two-channel MPP claim with asymmetric monitor state, verifying both the startup-released fulfill (channel A) and the event-completion-released fulfill (channel B).

Add a characterization test for a claimed MPP payment whose
preimage monitor updates are only partially persisted before restart.
The test drives both channels through a held fee-update commitment
dance, claims with async monitor persistence, reloads one fresh and
one stale monitor, and verifies that the bug leaves a sender-side HTLC
stuck after reconnect.
@TheBlueMatt TheBlueMatt force-pushed the 2026-03-startup-mpp-unblock branch from 6977e25 to 52a0030 Compare May 6, 2026 21:01
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

❌ Patch coverage is 92.85714% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 86.12%. Comparing base (5455058) to head (52a0030).
⚠️ Report is 23 commits behind head on main.

Files with missing lines Patch % Lines
lightning/src/ln/channelmanager.rs 92.85% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4520   +/-   ##
=======================================
  Coverage   86.11%   86.12%           
=======================================
  Files         157      157           
  Lines      108772   108786   +14     
  Branches   108772   108786   +14     
=======================================
+ Hits        93668    93688   +20     
+ Misses      12487    12484    -3     
+ Partials     2617     2614    -3     
Flag Coverage Δ
tests 86.12% <92.85%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MPP claim HTLC fulfills stuck in holding cell after node restart

4 participants