Defer ChainMonitor updates and persistence to flush() #4345

joostjager · 2026-01-26T09:55:50Z

Summary

Introduce DeferredChainMonitor, a wrapper around ChainMonitor that queues watch_channel and update_channel operations, returning InProgress until flush() is called. This enables batched persistence of monitor updates after ChannelManager persistence, ensuring correct ordering where the ChannelManager state is never ahead of the monitor state on restart.

The Problem

There's a race condition that can cause channel force closures: if the node crashes after writing channel monitors but before writing the channel manager, the monitors will be ahead of the manager on restart. This can lead to state desync and force closures.

The Solution

By deferring monitor writes until after the channel manager is persisted (via flush()), we ensure the manager is always at least as up-to-date as the monitors.

Key changes:

DeferredChainMonitor queues monitor operations and returns InProgress
Calling flush() applies pending operations and persists monitors
All ChainMonitor traits (Listen, Confirm, EventsProvider, etc.) are passed through, allowing drop-in replacement
Background processor updated to capture pending count before ChannelManager persistence, then flush after persistence completes

Alternative Designs Considered

Several approaches were explored to solve the monitor/manager persistence ordering problem:

1. Queue at KVStore level (#4310)

Introduces a QueuedKVStoreSync wrapper that queues all writes in memory, committing them in a single batch at chokepoints where data leaves the system (get_and_clear_pending_msg_events, get_and_clear_pending_events). This approach aims for true atomic multi-key writes but requires KVStore backends that support transactions (e.g., SQLite) - filesystem backends cannot achieve full atomicity.

Trade-offs: Most general solution but requires changes to persistence boundaries and cannot fully close the desync gap with filesystem storage.

2. Queue at Persister level (#4317)

Updates MonitorUpdatingPersister to queue persist operations in memory, with actual writes happening on flush(). Adds flush() to the Persist trait and ChainMonitor.

Trade-offs: Only fixes the issue for MonitorUpdatingPersister; custom Persist implementations remain vulnerable to the race condition.

3. Queue internally in ChainMonitor (#4351)

Modifies ChainMonitor directly to queue operations internally, returning InProgress until flush() is called.

Trade-offs: Requires an enormous amount of test changes since existing tests expect immediate persistence behavior.

ldk-reviews-bot · 2026-01-26T09:55:53Z

👋 Hi! I see this is a draft PR.
I'll wait to assign reviewers until you mark it as ready for review.
Just convert it out of draft status when you're ready for review!

joostjager · 2026-01-26T10:50:28Z

Added a DeferredChainMonitor wrapper instead of modifying ChainMonitor directly. The wrapper intercepts watch_channel and update_channel calls, queues them, and returns InProgress. When flush is called, it processes the queued operations and persists them in the correct order after ChannelManager persistence. This approach keeps ChainMonitor unchanged so that existing tests which expect synchronous behavior continue to work without modification. Only the background processor and production code paths use the deferred wrapper while the test suite can keep using ChainMonitor directly.

joostjager · 2026-01-26T14:04:57Z

Initially attempted to implement this as a thin adapter/wrapper that would sit between the ChannelManager and an existing ChainMonitor, forwarding calls while deferring the Watch operations. However, when integrating with ldk-node, this approach quickly ran into Rust ownership and lifetime issues since it required keeping both the original ChainMonitor and the wrapper around simultaneously. The current implementation takes a simpler approach where DeferredChainMonitor owns its own ChainMonitor internally and implements Deref to it, making it a complete drop-in replacement that can be instantiated with the same parameters as ChainMonitor while exposing all the same traits and methods.

codecov · 2026-01-26T15:54:27Z

Codecov Report

❌ Patch coverage is 86.48111% with 68 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.08%. Comparing base (8cdc86a) to head (d142660).
⚠️ Report is 15 commits behind head on main.

Files with missing lines	Patch %	Lines
lightning/src/chain/deferred.rs	87.78%	46 Missing and 7 partials ⚠️
lightning-background-processor/src/lib.rs	78.26%	11 Missing and 4 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4345      +/-   ##
==========================================
- Coverage   86.53%   86.08%   -0.45%     
==========================================
  Files         158      157       -1     
  Lines      103190   102897     -293     
  Branches   103190   102897     -293     
==========================================
- Hits        89300    88584     -716     
- Misses      11469    11807     +338     
- Partials     2421     2506      +85

Flag	Coverage Δ
fuzzing	`?`
tests	`86.08% <86.48%> (+0.25%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Introduce a `DeferredChainMonitor` wrapper around `ChainMonitor` that queues `watch_channel` and `update_channel` operations, returning `InProgress` until `flush()` is called. This enables batched persistence of monitor updates after `ChannelManager` persistence, ensuring correct ordering where the `ChannelManager` state is never ahead of the monitor state on restart. Key changes: - `DeferredChainMonitor` queues monitor operations and returns `InProgress` - Calling `flush()` applies pending operations and persists monitors - All `ChainMonitor` traits (Listen, Confirm, EventsProvider, etc.) are passed through, allowing drop-in replacement - Background processor updated to capture pending count before `ChannelManager` persistence, then flush after persistence completes Includes comprehensive tests covering the full channel lifecycle with payment flows using `DeferredChainMonitor`. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The test assumed `read_dir` would return files in a specific order (`.tmp` files first), but the Rust documentation states: "The order in which this iterator returns entries is platform and filesystem dependent." When the real monitor file was returned before a `.tmp` file, the assertion `mons.next().is_none()` would fail. Fix by adding a helper function that filters out `.tmp` files entirely, making the tests robust to any file ordering. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

joostjager changed the title ~~Chain mon deferred writes~~ Defer ChainMonitor updates and persistence to flush() Jan 26, 2026

joostjager force-pushed the chain-mon-deferred-writes branch 3 times, most recently from 36a8b33 to 73c0a66 Compare January 26, 2026 13:59

joostjager force-pushed the chain-mon-deferred-writes branch 2 times, most recently from 5bd0ea3 to 0c005d0 Compare January 26, 2026 14:08

joostjager force-pushed the chain-mon-deferred-writes branch from 0c005d0 to bc1b327 Compare January 27, 2026 11:29

This was referenced Jan 27, 2026

Batched persistence with a queuing KVStore (PoC) #4310

Closed

Defer MonitorUpdatingPersister writes to flush() #4317

Closed

Defer ChainMonitor updates and persistence to flush() #4351

Closed

joostjager and others added 2 commits January 27, 2026 13:28

joostjager force-pushed the chain-mon-deferred-writes branch from db134e7 to d142660 Compare January 27, 2026 12:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defer ChainMonitor updates and persistence to flush() #4345

Defer ChainMonitor updates and persistence to flush() #4345

joostjager commented Jan 26, 2026 •

edited

Loading

Uh oh!

ldk-reviews-bot commented Jan 26, 2026

Uh oh!

joostjager commented Jan 26, 2026

Uh oh!

joostjager commented Jan 26, 2026

Uh oh!

codecov bot commented Jan 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Defer ChainMonitor updates and persistence to flush() #4345

Are you sure you want to change the base?

Defer ChainMonitor updates and persistence to flush() #4345

Conversation

joostjager commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The Problem

The Solution

Alternative Designs Considered

1. Queue at KVStore level (#4310)

2. Queue at Persister level (#4317)

3. Queue internally in ChainMonitor (#4351)

Uh oh!

ldk-reviews-bot commented Jan 26, 2026

Uh oh!

joostjager commented Jan 26, 2026

Uh oh!

joostjager commented Jan 26, 2026

Uh oh!

codecov bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joostjager commented Jan 26, 2026 •

edited

Loading

codecov bot commented Jan 26, 2026 •

edited

Loading