Skip to content

feat: stale channel monitors recovery#502

Open
ben-kaufman wants to merge 9 commits intorelease-2.1.2from
fix/stale-monitor-recovery-release
Open

feat: stale channel monitors recovery#502
ben-kaufman wants to merge 9 commits intorelease-2.1.2from
fix/stale-monitor-recovery-release

Conversation

@ben-kaufman
Copy link
Contributor

Summary

  • Bumps ldk-node to rc.34 (153ecbe) which includes stale monitor recovery + commitment secrets reset
  • On BuildError.DangerousValue, automatically retries build with accept_stale_channel_monitors enabled
  • Adds required connectionTimeoutSecs: 10 for ElectrumSyncConfig (new in rc.34)

Matches Android PR #855 approach (always retry on DangerousValue, no one-shot flag needed).

How to test

  1. Reproduce the stale monitor state (overwrite a channel monitor in VSS with an older update_id)
  2. Launch the app — first build fails with DangerousValue
  3. Verify the automatic retry succeeds and the node starts
  4. Check logs for "Build failed with DangerousValue" followed by "build succeeded with accept_stale"
  5. Keep app open ~15s, verify "all monitors healed" in logs
  6. Kill and relaunch — verify normal startup (no retry needed since monitors are now healed)

Dependencies

🤖 Generated with Claude Code

ben-kaufman and others added 5 commits March 19, 2026 07:56
On BuildError.ReadFailed (likely stale ChannelMonitor from migration
overwrite), automatically retry once with accept_stale_channel_monitors
enabled. The ldk-node recovery flag force-syncs the monitor's update_id
and heals commitment state via a delayed chain sync + keysend round-trip.

A persisted UserDefaults flag ensures this only triggers once — set on
any successful build (affected or not), preventing future retries.

Depends on: synonymdev/ldk-node#76

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ReadFailed fires for 19+ code paths (KVStore errors, deserialization
failures, etc). DangerousValue is the dedicated variant that only fires
for the specific stale channel monitor case.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the persisted staleMonitorRecoveryAttempted flag and always retry
on DangerousValue. The flag was unnecessary — once monitors are healed,
DangerousValue never fires again on subsequent startups. This matches
the simpler approach in bitkit-android PR #855.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update from c5698d0 (pre-rc.33, no monitor overwrite protection) to
153ecbe (rc.34) which includes:
- accept_stale_channel_monitors flag
- BuildError.DangerousValue variant
- Commitment secrets reset on force_set_latest_update_id
- Delayed chain sync with keysend-based healing
- Sentinel skip in provide_secret for reset trees

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
rc.34 bindings require connectionTimeoutSecs as a non-optional field.
Set to 10 seconds matching Android PR #855.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ben-kaufman ben-kaufman force-pushed the fix/stale-monitor-recovery-release branch from 65037ab to 1b9fea2 Compare March 18, 2026 23:56
@ben-kaufman ben-kaufman changed the base branch from release-2.1.1 to release-2.1.2 March 18, 2026 23:56
@ben-kaufman ben-kaufman marked this pull request as ready for review March 18, 2026 23:57
@ben-kaufman ben-kaufman requested review from jvsena42 and ovitrif March 18, 2026 23:57
@claude

This comment has been minimized.

The exhaustive switch on BuildError was missing the new DangerousValue
variant from rc.34, which would cause a compile error.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude

This comment has been minimized.

Copy link
Collaborator

@ovitrif ovitrif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utAck

@claude

This comment has been minimized.

@piotr-iohk
Copy link
Collaborator

Tested:

  • restore broken 2.1.0 wallet into 2.1.2 ✅
  • update 2.1.0 with broken wallet to 2.1.2 ✅
  • update from 2.0.6 (wallet with gap) to 2.1.2 ✅
  • update from 2.0.6 (wallet with gap) -> 2.1.1 -> 2.1.2 ✅
  • update 2.1.0 with healthy wallet to 2.1.2 (regression check) ✅
  • 2.1.0 with broken wallet (advanced 600 blocks) -> update to 2.1.2 ✅

Each time wallet is operational. Ln payments sent and received with success.

Copy link
Member

@jvsena42 jvsena42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, testing...

Copy link
Member

@jvsena42 jvsena42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Reproduced the bug: RN -> open channel -> v2.0.6 -> 21 payments -> v2.1.0 -> Error
✅ Happy path: RN -> open channel -> migrated to fix/stale-monitor-recovery-release
✅ Heal path: RN -> open channel -> v2.0.6 -> 21 payments -> v2.1.0 -> Error -> checkout to fix/stale-monitor-recovery-release -> recover channel

Recover logs:

ERROR❌:  Without the latest ChannelMonitor we cannot continue without risking funds. - [LDK] [lightning::ln::channelmanager:17388] [Logger.swift: log(record:) line: 177]
ERROR❌:  Please ensure the chain::Watch API requirements are met and file a bug report at https://github.com/lightningdevkit/rust-lightning - [LDK] [lightning::ln::channelmanager:17389] [Logger.swift: log(record:) line: 177]
ERROR❌: Channel manager deserialization returned DangerousValue (stale channel monitors). Use set_accept_stale_channel_monitors(true) to recover: Value would be dangerous to continue execution with - [LDK] [ldk_node::builder:2026] [Logger.swift: log(record:) line: 177]
WARN⚠️: Build failed with DangerousValue. Retrying with accept_stale_channel_monitors for recovery. - Recovery [LightningService.swift: setup(walletIndex:electrumServerUrl:rgsServerUrl:channelMigration:) line: 147]
DEBUG: Loaded network graph from local cache with RGS timestamp 0 - [LDK] [ldk_node::builder:1802] [Logger.swift: log(record:) line: 171]
DEBUG: External scores from cache merged successfully - [LDK] [ldk_node::builder:1945] [Logger.swift: log(record:) line: 171]
INFOℹ️: Successfully loaded channel 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973 at update_id 105 against monitor at update id 0 with 0 blocked updates - [LDK] [lightning::ln::channelmanager:16927] [Logger.swift: log(record:) line: 173]
WARN⚠️: Accepting stale ChannelMonitor for channel 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973: monitor at update_id 0 but ChannelManager at 105. Forcing update_id sync. Monitor state will self-heal on next channel update. - [LDK] [lightning::ln::channelmanager:17370] [Logger.swift: log(record:) line: 175]
DEBUG: Got new ChannelMonitor for channel 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973 - [LDK] [lightning::chain::chainmonitor:1369] [Logger.swift: log(record:) line: 171]
INFOℹ️: Persistence of new ChannelMonitor for channel 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973 completed - [LDK] [lightning::chain::chainmonitor:1383] [Logger.swift: log(record:) line: 173]
DEBUG: Registering funding outpoint 73f98d08254ec799723375c5248c16058761d3dc294d869e88379abb7221bd71:0 with the filter to monitor confirmations - [LDK] [lightning::chain::channelmonitor:2178] [Logger.swift: log(record:) line: 171]
DEBUG: Registering outpoint 73f98d08254ec799723375c5248c16058761d3dc294d869e88379abb7221bd71:0 with the filter to monitor spend - [LDK] [lightning::chain::channelmonitor:2186] [Logger.swift: log(record:) line: 171]
INFOℹ️: Stale monitor recovery: build succeeded with accept_stale - Recovery [LightningService.swift: setup(walletIndex:electrumServerUrl:rgsServerUrl:channelMigration:) line: 167]
PERF: setup(walletIndex:electrumServerUrl:rgsServerUrl:channelMigration:) took 12.26 seconds on ldk queue [ServiceQueue.swift: background(_:_:functionName:) line: 58]
INFOℹ️: LDK node setup [LightningService.swift: setup(walletIndex:electrumServerUrl:rgsServerUrl:channelMigration:) line: 171]
DEBUG: Starting node... [LightningService.swift: start(onEvent:) line: 254]
INFOℹ️: Starting up LDK Node with node ID 02488fa2de6eaebfa6728581d58c2964f331237090d909f5bca3d0ed130376ba48 on network: regtest - [LDK] [ldk_node:241] [Logger.swift: log(record:) line: 173]
DEBUG: Fee rate estimation updated for OnchainPayment: 250 sats/kwu - [LDK] [ldk_node::chain::electrum:832] [Logger.swift: log(record:) line: 171]
DEBUG: Fee rate estimation updated for ChannelFunding: 250 sats/kwu - [LDK] [ldk_node::chain::electrum:832] [Logger.swift: log(record:) line: 171]
DEBUG: Fee rate estimation updated for Lightning(MaximumFeeEstimate): 2775 sats/kwu - [LDK] [ldk_node::chain::electrum:832] [Logger.swift: log(record:) line: 171]
DEBUG: Fee rate estimation updated for Lightning(UrgentOnChainSweep): 250 sats/kwu - [LDK] [ldk_node::chain::electrum:832] [Logger.swift: log(record:) line: 171]
DEBUG: Fee rate estimation updated for Lightning(MinAllowedAnchorChannelRemoteFee): 250 sats/kwu - [LDK] [ldk_node::chain::electrum:832] [Logger.swift: log(record:) line: 171]
DEBUG: Fee rate estimation updated for Lightning(MinAllowedNonAnchorChannelRemoteFee): 253 sats/kwu - [LDK] [ldk_node::chain::electrum:832] [Logger.swift: log(record:) line: 171]
DEBUG: Fee rate estimation updated for Lightning(AnchorChannelFee): 250 sats/kwu - [LDK] [ldk_node::chain::electrum:832] [Logger.swift: log(record:) line: 171]
DEBUG: Fee rate estimation updated for Lightning(NonAnchorChannelFee): 250 sats/kwu - [LDK] [ldk_node::chain::electrum:832] [Logger.swift: log(record:) line: 171]
DEBUG: Fee rate estimation updated for Lightning(ChannelCloseMinimum): 250 sats/kwu - [LDK] [ldk_node::chain::electrum:832] [Logger.swift: log(record:) line: 171]
DEBUG: Fee rate estimation updated for Lightning(OutputSpendingFee): 250 sats/kwu - [LDK] [ldk_node::chain::electrum:832] [Logger.swift: log(record:) line: 171]
INFOℹ️: Fee rate cache update finished in 664ms. - [LDK] [ldk_node::chain::electrum:400] [Logger.swift: log(record:) line: 173]
INFOℹ️: External scores background syncing enabled from https://api.stag0.blocktank.to/scorer - [LDK] [ldk_node::scoring:24] [Logger.swift: log(record:) line: 173]
INFOℹ️: Stale monitor recovery: triggering commitment round-trips to heal monitors before starting chain sync... - [LDK] [ldk_node:662] [Logger.swift: log(record:) line: 173]
DEBUG: Calling ChannelManager's timer_tick_occurred on startup - [LDK] [lightning_background_processor:970] [Logger.swift: log(record:) line: 171]
INFOℹ️: Connecting to peer: 028a8910b0048630d4eb17af25668cdd7ea6f2d8ae20956e7a06e2ae46ebcb69fc@34.65.86.104:9400 - [LDK] [ldk_node::connection:65] [Logger.swift: log(record:) line: 173]
DEBUG: Background sync of external scores started. - [LDK] [ldk_node::scoring:40] [Logger.swift: log(record:) line: 171]
DEBUG: Rebroadcasting monitor's pending claims on startup - [LDK] [lightning_background_processor:972] [Logger.swift: log(record:) line: 171]
INFOℹ️: Stale monitor recovery: tracking 1 channel(s) for healing. - [LDK] [ldk_node:696] [Logger.swift: log(record:) line: 173]
DEBUG: Calling time_passed on scorer at startup - [LDK] [lightning_background_processor:1170] [Logger.swift: log(record:) line: 171]
DEBUG: Finished noise handshake for connection with 028a8910b0048630d4eb17af25668cdd7ea6f2d8ae20956e7a06e2ae46ebcb69fc - [LDK] [lightning::ln::peer_handler:1867] [Logger.swift: log(record:) line: 171]
DEBUG: Enqueueing message Init { features: [33, 81, 138, 10, 136, 152, 8, 128], networks: Some([06226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f]), remote_network_address: Some(TcpIpV4 { addr: [34, 65, 86, 104], port: 9400 }) } to 028a8910b0048630d4eb17af25668cdd7ea6f2d8ae20956e7a06e2ae46ebcb69fc - [LDK] [lightning::ln::peer_handler:1719] [Logger.swift: log(record:) line: 171]
DEBUG: Processing RGS data... - [LDK] [lightning_rapid_gossip_sync::processing:67] [Logger.swift: log(record:) line: 171]
DEBUG: Failed to update network graph with RGS data: LightningError(LightningError { err: "Rapid Gossip Sync data is more than two weeks old", action: IgnoreError }) - [LDK] [ldk_node::gossip:115] [Logger.swift: log(record:) line: 171]
ERROR❌: Background sync of RGS gossip data failed: Failed to update gossip data. - [LDK] [ldk_node:305] [Logger.swift: log(record:) line: 177]
INFOℹ️: Received peer Init message from 028a8910b0048630d4eb17af25668cdd7ea6f2d8ae20956e7a06e2ae46ebcb69fc: DataLossProtect: required, InitialRoutingSync: not supported, UpfrontShutdownScript: supported, GossipQueries: supported, VariableLengthOnion: required, StaticRemoteKey: required, PaymentSecret: required, BasicMPP: supported, Wumbo: supported, AnchorsNonzeroFeeHtlcTx: not supported, AnchorsZeroFeeHtlcTx: supported, RouteBlinding: supported, ShutdownAnySegwit: supported, DualFund: not supported, Taproot: supported, Quiescence: supported, OnionMessages: not supported, ProvideStorage: not supported, ChannelType: supported, SCIDPrivacy: supported, ZeroConf: supported, Trampoline: not supported, SimpleClose: not supported, SpliceProduction: not supported, SplicePrototype: not supported, AnchorZeroFeeCommitmentsStaging: not supported, HtlcHold: not supported, unknown flags: supported - [LDK] [lightning::ln::peer_handler:2150] [Logger.swift: log(record:) line: 173]
DEBUG: Generating channel_reestablish events for 028a8910b0048630d4eb17af25668cdd7ea6f2d8ae20956e7a06e2ae46ebcb69fc - [LDK] [lightning::ln::channelmanager:13945] [Logger.swift: log(record:) line: 171]
DEBUG: Enough info to generate a Data Loss Protect with per_commitment_secret f6c93f7e1b966acaf05fc356c2f37f55951a4dd1d6f3419ac5c1a0f0a7cfc32c for channel 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973 - [LDK] [lightning::ln::channel:11835] [Logger.swift: log(record:) line: 171]
DEBUG: Handling SendChannelReestablish event in peer_handler for node 028a8910b0048630d4eb17af25668cdd7ea6f2d8ae20956e7a06e2ae46ebcb69fc for channel 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973 - [LDK] [lightning::ln::peer_handler:3111] [Logger.swift: log(record:) line: 171]
DEBUG: Enqueueing message ChannelReestablish { channel_id: 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973, next_local_commitment_number: 43, next_remote_commitment_number: 42, your_last_per_commitment_secret: [246, 201, 63, 126, 27, 150, 106, 202, 240, 95, 195, 86, 194, 243, 127, 85, 149, 26, 77, 209, 214, 243, 65, 154, 197, 193, 160, 240, 167, 207, 195, 44], my_current_per_commitment_point: PublicKey(02020202020202020202020202020202020202020202020202020202020202ffcee50f772e0a9972250d4b61b3e5beb95de897c73b4ed1cc35ed013accf1c840), next_funding: None, my_current_funding_locked: Some(FundingLocked { txid: 73f98d08254ec799723375c5248c16058761d3dc294d869e88379abb7221bd71, retransmit_flags: 0 }) } to 028a8910b0048630d4eb17af25668cdd7ea6f2d8ae20956e7a06e2ae46ebcb69fc - [LDK] [lightning::ln::peer_handler:1719] [Logger.swift: log(record:) line: 171]
DEBUG: Received message ChannelReestablish(ChannelReestablish { channel_id: 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973, next_local_commitment_number: 43, next_remote_commitment_number: 42, your_last_per_commitment_secret: [100, 121, 74, 191, 71, 219, 32, 73, 62, 64, 59, 38, 151, 92, 196, 218, 195, 49, 29, 11, 125, 96, 242, 158, 247, 25, 247, 113, 148, 116, 144, 46], my_current_per_commitment_point: PublicKey(1fae6b7887064e1bff0c97ab9d262412fb3e08fd43fc083f878230765cd9ef58492799896116589d6abbe813b324d99aedd8c9dc9d9b10f2fda7cd9ca9139864), next_funding: None, my_current_funding_locked: None }) from 028a8910b0048630d4eb17af25668cdd7ea6f2d8ae20956e7a06e2ae46ebcb69fc - [LDK] [lightning::ln::peer_handler:2400] [Logger.swift: log(record:) line: 171]
DEBUG: Reconnected channel 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973 with no loss - [LDK] [lightning::ln::channel:10075] [Logger.swift: log(record:) line: 171]
DEBUG: Attempting to generate channel update for channel 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973 - [LDK] [lightning::ln::channelmanager:5174] [Logger.swift: log(record:) line: 171]
DEBUG: Generating channel update for channel 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973 - [LDK] [lightning::ln::channelmanager:5181] [Logger.swift: log(record:) line: 171]
DEBUG: Handling channel resumption for channel 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973 with no RAA, no commitment update, 0 pending forwards, 0 pending update_add_htlcs, not broadcasting funding, without channel ready, without announcement, without tx_signatures, without tx_abort - [LDK] [lightning::ln::channelmanager:9558] [Logger.swift: log(record:) line: 171]
DEBUG: Handling SendChannelUpdate event in peer_handler for node 028a8910b0048630d4eb17af25668cdd7ea6f2d8ae20956e7a06e2ae46ebcb69fc for channel 17592186044416005151 - [LDK] [lightning::ln::peer_handler:3206] [Logger.swift: log(record:) line: 171]
DEBUG: Persisting ChannelManager... - [LDK] [lightning_background_processor:1080] [Logger.swift: log(record:) line: 171]
DEBUG: Done persisting ChannelManager. - [LDK] [lightning_background_processor:1108] [Logger.swift: log(record:) line: 171]
DEBUG: External scores merged successfully - [LDK] [ldk_node::scoring:106] [Logger.swift: log(record:) line: 171]
INFOℹ️: Stale monitor recovery: syncing chain tip... - [LDK] [ldk_node:715] [Logger.swift: log(record:) line: 173]
DEBUG: Starting transaction sync. - [LDK] [lightning_transaction_sync::electrum:92] [Logger.swift: log(record:) line: 171]
DEBUG: New best block: 52d8f195f49a0f872b0f7da0add8acfb3f2ae0602bdfa72e874f6618db61fd6c at height 92326 - [LDK] [lightning::ln::channelmanager:14218] [Logger.swift: log(record:) line: 171]
DEBUG: Not producing channel_ready: we do not need a commitment update - [LDK] [lightning::ln::channel:11077] [Logger.swift: log(record:) line: 171]
DEBUG: New best block 52d8f195f49a0f872b0f7da0add8acfb3f2ae0602bdfa72e874f6618db61fd6c at height 92326 provided via best_block_updated - [LDK] [lightning::chain::chainmonitor:1298] [Logger.swift: log(record:) line: 171]
DEBUG: Persisting ChannelManager... - [LDK] [lightning_background_processor:1080] [Logger.swift: log(record:) line: 171]
DEBUG: Syncing Channel Monitor for channel 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973 - [LDK] [lightning::chain::chainmonitor:583] [Logger.swift: log(record:) line: 171]
DEBUG: Done persisting ChannelManager. - [LDK] [lightning_background_processor:1108] [Logger.swift: log(record:) line: 171]
DEBUG: Finished syncing Channel Monitor for channel 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973 for block-data - [LDK] [lightning::chain::chainmonitor:595] [Logger.swift: log(record:) line: 171]
DEBUG: Finished transaction sync at tip 52d8f195f49a0f872b0f7da0add8acfb3f2ae0602bdfa72e874f6618db61fd6c in 1761ms: 0 confirmed, 0 unconfirmed. - [LDK] 

@ovitrif
Copy link
Collaborator

ovitrif commented Mar 19, 2026

⚠️ PR must be updated to depend on ldk-node rc.36

@pwltr pwltr force-pushed the fix/stale-monitor-recovery-release branch from 22d8bb0 to 119957b Compare March 20, 2026 08:38
@claude
Copy link

claude bot commented Mar 20, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants