feat: stale channel monitors recovery#502
Open
ben-kaufman wants to merge 9 commits intorelease-2.1.2from
Open
Conversation
On BuildError.ReadFailed (likely stale ChannelMonitor from migration overwrite), automatically retry once with accept_stale_channel_monitors enabled. The ldk-node recovery flag force-syncs the monitor's update_id and heals commitment state via a delayed chain sync + keysend round-trip. A persisted UserDefaults flag ensures this only triggers once — set on any successful build (affected or not), preventing future retries. Depends on: synonymdev/ldk-node#76 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ReadFailed fires for 19+ code paths (KVStore errors, deserialization failures, etc). DangerousValue is the dedicated variant that only fires for the specific stale channel monitor case. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the persisted staleMonitorRecoveryAttempted flag and always retry on DangerousValue. The flag was unnecessary — once monitors are healed, DangerousValue never fires again on subsequent startups. This matches the simpler approach in bitkit-android PR #855. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update from c5698d0 (pre-rc.33, no monitor overwrite protection) to 153ecbe (rc.34) which includes: - accept_stale_channel_monitors flag - BuildError.DangerousValue variant - Commitment secrets reset on force_set_latest_update_id - Delayed chain sync with keysend-based healing - Sentinel skip in provide_secret for reset trees Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
rc.34 bindings require connectionTimeoutSecs as a non-optional field. Set to 10 seconds matching Android PR #855. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
65037ab to
1b9fea2
Compare
This comment has been minimized.
This comment has been minimized.
The exhaustive switch on BuildError was missing the new DangerousValue variant from rc.34, which would cause a compile error. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
jvsena42
reviewed
Mar 19, 2026
This was referenced Mar 19, 2026
This comment has been minimized.
This comment has been minimized.
Collaborator
|
Tested:
Each time wallet is operational. Ln payments sent and received with success. |
ovitrif
approved these changes
Mar 19, 2026
2 tasks
jvsena42
reviewed
Mar 19, 2026
Member
jvsena42
left a comment
There was a problem hiding this comment.
✅ Reproduced the bug: RN -> open channel -> v2.0.6 -> 21 payments -> v2.1.0 -> Error
✅ Happy path: RN -> open channel -> migrated to fix/stale-monitor-recovery-release
✅ Heal path: RN -> open channel -> v2.0.6 -> 21 payments -> v2.1.0 -> Error -> checkout to fix/stale-monitor-recovery-release -> recover channel
Recover logs:
ERROR❌: Without the latest ChannelMonitor we cannot continue without risking funds. - [LDK] [lightning::ln::channelmanager:17388] [Logger.swift: log(record:) line: 177]
ERROR❌: Please ensure the chain::Watch API requirements are met and file a bug report at https://github.com/lightningdevkit/rust-lightning - [LDK] [lightning::ln::channelmanager:17389] [Logger.swift: log(record:) line: 177]
ERROR❌: Channel manager deserialization returned DangerousValue (stale channel monitors). Use set_accept_stale_channel_monitors(true) to recover: Value would be dangerous to continue execution with - [LDK] [ldk_node::builder:2026] [Logger.swift: log(record:) line: 177]
WARN⚠️: Build failed with DangerousValue. Retrying with accept_stale_channel_monitors for recovery. - Recovery [LightningService.swift: setup(walletIndex:electrumServerUrl:rgsServerUrl:channelMigration:) line: 147]
DEBUG: Loaded network graph from local cache with RGS timestamp 0 - [LDK] [ldk_node::builder:1802] [Logger.swift: log(record:) line: 171]
DEBUG: External scores from cache merged successfully - [LDK] [ldk_node::builder:1945] [Logger.swift: log(record:) line: 171]
INFOℹ️: Successfully loaded channel 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973 at update_id 105 against monitor at update id 0 with 0 blocked updates - [LDK] [lightning::ln::channelmanager:16927] [Logger.swift: log(record:) line: 173]
WARN⚠️: Accepting stale ChannelMonitor for channel 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973: monitor at update_id 0 but ChannelManager at 105. Forcing update_id sync. Monitor state will self-heal on next channel update. - [LDK] [lightning::ln::channelmanager:17370] [Logger.swift: log(record:) line: 175]
DEBUG: Got new ChannelMonitor for channel 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973 - [LDK] [lightning::chain::chainmonitor:1369] [Logger.swift: log(record:) line: 171]
INFOℹ️: Persistence of new ChannelMonitor for channel 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973 completed - [LDK] [lightning::chain::chainmonitor:1383] [Logger.swift: log(record:) line: 173]
DEBUG: Registering funding outpoint 73f98d08254ec799723375c5248c16058761d3dc294d869e88379abb7221bd71:0 with the filter to monitor confirmations - [LDK] [lightning::chain::channelmonitor:2178] [Logger.swift: log(record:) line: 171]
DEBUG: Registering outpoint 73f98d08254ec799723375c5248c16058761d3dc294d869e88379abb7221bd71:0 with the filter to monitor spend - [LDK] [lightning::chain::channelmonitor:2186] [Logger.swift: log(record:) line: 171]
INFOℹ️: Stale monitor recovery: build succeeded with accept_stale - Recovery [LightningService.swift: setup(walletIndex:electrumServerUrl:rgsServerUrl:channelMigration:) line: 167]
PERF: setup(walletIndex:electrumServerUrl:rgsServerUrl:channelMigration:) took 12.26 seconds on ldk queue [ServiceQueue.swift: background(_:_:functionName:) line: 58]
INFOℹ️: LDK node setup [LightningService.swift: setup(walletIndex:electrumServerUrl:rgsServerUrl:channelMigration:) line: 171]
DEBUG: Starting node... [LightningService.swift: start(onEvent:) line: 254]
INFOℹ️: Starting up LDK Node with node ID 02488fa2de6eaebfa6728581d58c2964f331237090d909f5bca3d0ed130376ba48 on network: regtest - [LDK] [ldk_node:241] [Logger.swift: log(record:) line: 173]
DEBUG: Fee rate estimation updated for OnchainPayment: 250 sats/kwu - [LDK] [ldk_node::chain::electrum:832] [Logger.swift: log(record:) line: 171]
DEBUG: Fee rate estimation updated for ChannelFunding: 250 sats/kwu - [LDK] [ldk_node::chain::electrum:832] [Logger.swift: log(record:) line: 171]
DEBUG: Fee rate estimation updated for Lightning(MaximumFeeEstimate): 2775 sats/kwu - [LDK] [ldk_node::chain::electrum:832] [Logger.swift: log(record:) line: 171]
DEBUG: Fee rate estimation updated for Lightning(UrgentOnChainSweep): 250 sats/kwu - [LDK] [ldk_node::chain::electrum:832] [Logger.swift: log(record:) line: 171]
DEBUG: Fee rate estimation updated for Lightning(MinAllowedAnchorChannelRemoteFee): 250 sats/kwu - [LDK] [ldk_node::chain::electrum:832] [Logger.swift: log(record:) line: 171]
DEBUG: Fee rate estimation updated for Lightning(MinAllowedNonAnchorChannelRemoteFee): 253 sats/kwu - [LDK] [ldk_node::chain::electrum:832] [Logger.swift: log(record:) line: 171]
DEBUG: Fee rate estimation updated for Lightning(AnchorChannelFee): 250 sats/kwu - [LDK] [ldk_node::chain::electrum:832] [Logger.swift: log(record:) line: 171]
DEBUG: Fee rate estimation updated for Lightning(NonAnchorChannelFee): 250 sats/kwu - [LDK] [ldk_node::chain::electrum:832] [Logger.swift: log(record:) line: 171]
DEBUG: Fee rate estimation updated for Lightning(ChannelCloseMinimum): 250 sats/kwu - [LDK] [ldk_node::chain::electrum:832] [Logger.swift: log(record:) line: 171]
DEBUG: Fee rate estimation updated for Lightning(OutputSpendingFee): 250 sats/kwu - [LDK] [ldk_node::chain::electrum:832] [Logger.swift: log(record:) line: 171]
INFOℹ️: Fee rate cache update finished in 664ms. - [LDK] [ldk_node::chain::electrum:400] [Logger.swift: log(record:) line: 173]
INFOℹ️: External scores background syncing enabled from https://api.stag0.blocktank.to/scorer - [LDK] [ldk_node::scoring:24] [Logger.swift: log(record:) line: 173]
INFOℹ️: Stale monitor recovery: triggering commitment round-trips to heal monitors before starting chain sync... - [LDK] [ldk_node:662] [Logger.swift: log(record:) line: 173]
DEBUG: Calling ChannelManager's timer_tick_occurred on startup - [LDK] [lightning_background_processor:970] [Logger.swift: log(record:) line: 171]
INFOℹ️: Connecting to peer: 028a8910b0048630d4eb17af25668cdd7ea6f2d8ae20956e7a06e2ae46ebcb69fc@34.65.86.104:9400 - [LDK] [ldk_node::connection:65] [Logger.swift: log(record:) line: 173]
DEBUG: Background sync of external scores started. - [LDK] [ldk_node::scoring:40] [Logger.swift: log(record:) line: 171]
DEBUG: Rebroadcasting monitor's pending claims on startup - [LDK] [lightning_background_processor:972] [Logger.swift: log(record:) line: 171]
INFOℹ️: Stale monitor recovery: tracking 1 channel(s) for healing. - [LDK] [ldk_node:696] [Logger.swift: log(record:) line: 173]
DEBUG: Calling time_passed on scorer at startup - [LDK] [lightning_background_processor:1170] [Logger.swift: log(record:) line: 171]
DEBUG: Finished noise handshake for connection with 028a8910b0048630d4eb17af25668cdd7ea6f2d8ae20956e7a06e2ae46ebcb69fc - [LDK] [lightning::ln::peer_handler:1867] [Logger.swift: log(record:) line: 171]
DEBUG: Enqueueing message Init { features: [33, 81, 138, 10, 136, 152, 8, 128], networks: Some([06226e46111a0b59caaf126043eb5bbf28c34f3a5e332a1fc7b2b73cf188910f]), remote_network_address: Some(TcpIpV4 { addr: [34, 65, 86, 104], port: 9400 }) } to 028a8910b0048630d4eb17af25668cdd7ea6f2d8ae20956e7a06e2ae46ebcb69fc - [LDK] [lightning::ln::peer_handler:1719] [Logger.swift: log(record:) line: 171]
DEBUG: Processing RGS data... - [LDK] [lightning_rapid_gossip_sync::processing:67] [Logger.swift: log(record:) line: 171]
DEBUG: Failed to update network graph with RGS data: LightningError(LightningError { err: "Rapid Gossip Sync data is more than two weeks old", action: IgnoreError }) - [LDK] [ldk_node::gossip:115] [Logger.swift: log(record:) line: 171]
ERROR❌: Background sync of RGS gossip data failed: Failed to update gossip data. - [LDK] [ldk_node:305] [Logger.swift: log(record:) line: 177]
INFOℹ️: Received peer Init message from 028a8910b0048630d4eb17af25668cdd7ea6f2d8ae20956e7a06e2ae46ebcb69fc: DataLossProtect: required, InitialRoutingSync: not supported, UpfrontShutdownScript: supported, GossipQueries: supported, VariableLengthOnion: required, StaticRemoteKey: required, PaymentSecret: required, BasicMPP: supported, Wumbo: supported, AnchorsNonzeroFeeHtlcTx: not supported, AnchorsZeroFeeHtlcTx: supported, RouteBlinding: supported, ShutdownAnySegwit: supported, DualFund: not supported, Taproot: supported, Quiescence: supported, OnionMessages: not supported, ProvideStorage: not supported, ChannelType: supported, SCIDPrivacy: supported, ZeroConf: supported, Trampoline: not supported, SimpleClose: not supported, SpliceProduction: not supported, SplicePrototype: not supported, AnchorZeroFeeCommitmentsStaging: not supported, HtlcHold: not supported, unknown flags: supported - [LDK] [lightning::ln::peer_handler:2150] [Logger.swift: log(record:) line: 173]
DEBUG: Generating channel_reestablish events for 028a8910b0048630d4eb17af25668cdd7ea6f2d8ae20956e7a06e2ae46ebcb69fc - [LDK] [lightning::ln::channelmanager:13945] [Logger.swift: log(record:) line: 171]
DEBUG: Enough info to generate a Data Loss Protect with per_commitment_secret f6c93f7e1b966acaf05fc356c2f37f55951a4dd1d6f3419ac5c1a0f0a7cfc32c for channel 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973 - [LDK] [lightning::ln::channel:11835] [Logger.swift: log(record:) line: 171]
DEBUG: Handling SendChannelReestablish event in peer_handler for node 028a8910b0048630d4eb17af25668cdd7ea6f2d8ae20956e7a06e2ae46ebcb69fc for channel 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973 - [LDK] [lightning::ln::peer_handler:3111] [Logger.swift: log(record:) line: 171]
DEBUG: Enqueueing message ChannelReestablish { channel_id: 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973, next_local_commitment_number: 43, next_remote_commitment_number: 42, your_last_per_commitment_secret: [246, 201, 63, 126, 27, 150, 106, 202, 240, 95, 195, 86, 194, 243, 127, 85, 149, 26, 77, 209, 214, 243, 65, 154, 197, 193, 160, 240, 167, 207, 195, 44], my_current_per_commitment_point: PublicKey(02020202020202020202020202020202020202020202020202020202020202ffcee50f772e0a9972250d4b61b3e5beb95de897c73b4ed1cc35ed013accf1c840), next_funding: None, my_current_funding_locked: Some(FundingLocked { txid: 73f98d08254ec799723375c5248c16058761d3dc294d869e88379abb7221bd71, retransmit_flags: 0 }) } to 028a8910b0048630d4eb17af25668cdd7ea6f2d8ae20956e7a06e2ae46ebcb69fc - [LDK] [lightning::ln::peer_handler:1719] [Logger.swift: log(record:) line: 171]
DEBUG: Received message ChannelReestablish(ChannelReestablish { channel_id: 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973, next_local_commitment_number: 43, next_remote_commitment_number: 42, your_last_per_commitment_secret: [100, 121, 74, 191, 71, 219, 32, 73, 62, 64, 59, 38, 151, 92, 196, 218, 195, 49, 29, 11, 125, 96, 242, 158, 247, 25, 247, 113, 148, 116, 144, 46], my_current_per_commitment_point: PublicKey(1fae6b7887064e1bff0c97ab9d262412fb3e08fd43fc083f878230765cd9ef58492799896116589d6abbe813b324d99aedd8c9dc9d9b10f2fda7cd9ca9139864), next_funding: None, my_current_funding_locked: None }) from 028a8910b0048630d4eb17af25668cdd7ea6f2d8ae20956e7a06e2ae46ebcb69fc - [LDK] [lightning::ln::peer_handler:2400] [Logger.swift: log(record:) line: 171]
DEBUG: Reconnected channel 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973 with no loss - [LDK] [lightning::ln::channel:10075] [Logger.swift: log(record:) line: 171]
DEBUG: Attempting to generate channel update for channel 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973 - [LDK] [lightning::ln::channelmanager:5174] [Logger.swift: log(record:) line: 171]
DEBUG: Generating channel update for channel 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973 - [LDK] [lightning::ln::channelmanager:5181] [Logger.swift: log(record:) line: 171]
DEBUG: Handling channel resumption for channel 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973 with no RAA, no commitment update, 0 pending forwards, 0 pending update_add_htlcs, not broadcasting funding, without channel ready, without announcement, without tx_signatures, without tx_abort - [LDK] [lightning::ln::channelmanager:9558] [Logger.swift: log(record:) line: 171]
DEBUG: Handling SendChannelUpdate event in peer_handler for node 028a8910b0048630d4eb17af25668cdd7ea6f2d8ae20956e7a06e2ae46ebcb69fc for channel 17592186044416005151 - [LDK] [lightning::ln::peer_handler:3206] [Logger.swift: log(record:) line: 171]
DEBUG: Persisting ChannelManager... - [LDK] [lightning_background_processor:1080] [Logger.swift: log(record:) line: 171]
DEBUG: Done persisting ChannelManager. - [LDK] [lightning_background_processor:1108] [Logger.swift: log(record:) line: 171]
DEBUG: External scores merged successfully - [LDK] [ldk_node::scoring:106] [Logger.swift: log(record:) line: 171]
INFOℹ️: Stale monitor recovery: syncing chain tip... - [LDK] [ldk_node:715] [Logger.swift: log(record:) line: 173]
DEBUG: Starting transaction sync. - [LDK] [lightning_transaction_sync::electrum:92] [Logger.swift: log(record:) line: 171]
DEBUG: New best block: 52d8f195f49a0f872b0f7da0add8acfb3f2ae0602bdfa72e874f6618db61fd6c at height 92326 - [LDK] [lightning::ln::channelmanager:14218] [Logger.swift: log(record:) line: 171]
DEBUG: Not producing channel_ready: we do not need a commitment update - [LDK] [lightning::ln::channel:11077] [Logger.swift: log(record:) line: 171]
DEBUG: New best block 52d8f195f49a0f872b0f7da0add8acfb3f2ae0602bdfa72e874f6618db61fd6c at height 92326 provided via best_block_updated - [LDK] [lightning::chain::chainmonitor:1298] [Logger.swift: log(record:) line: 171]
DEBUG: Persisting ChannelManager... - [LDK] [lightning_background_processor:1080] [Logger.swift: log(record:) line: 171]
DEBUG: Syncing Channel Monitor for channel 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973 - [LDK] [lightning::chain::chainmonitor:583] [Logger.swift: log(record:) line: 171]
DEBUG: Done persisting ChannelManager. - [LDK] [lightning_background_processor:1108] [Logger.swift: log(record:) line: 171]
DEBUG: Finished syncing Channel Monitor for channel 71bd2172bb9a37889e864d29dcd3618705168c24c575337299c74e25088df973 for block-data - [LDK] [lightning::chain::chainmonitor:595] [Logger.swift: log(record:) line: 171]
DEBUG: Finished transaction sync at tip 52d8f195f49a0f872b0f7da0add8acfb3f2ae0602bdfa72e874f6618db61fd6c in 1761ms: 0 confirmed, 0 unconfirmed. - [LDK]
Collaborator
|
|
22d8bb0 to
119957b
Compare
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
153ecbe) which includes stale monitor recovery + commitment secrets resetBuildError.DangerousValue, automatically retries build withaccept_stale_channel_monitorsenabledconnectionTimeoutSecs: 10forElectrumSyncConfig(new in rc.34)Matches Android PR #855 approach (always retry on DangerousValue, no one-shot flag needed).
How to test
Dependencies
🤖 Generated with Claude Code