Skip to content

feat: stale channel monitors recovery#855

Open
ovitrif wants to merge 3 commits intorelease-2.1.2from
fix/stale-monitor-recovery-v2
Open

feat: stale channel monitors recovery#855
ovitrif wants to merge 3 commits intorelease-2.1.2from
fix/stale-monitor-recovery-v2

Conversation

@ovitrif
Copy link
Collaborator

@ovitrif ovitrif commented Mar 18, 2026

Fixes: #847

This PR adds automatic recovery when the LDK node encounters stale channel monitors during startup, and bumps ldk-node to rc.34 which includes the upstream healing logic.

see also:

Description

When a channel monitor falls behind the channel manager (e.g. due to faulty overwrite during an unfiltered migration from LDK to ldk-node), ldk-node refuses to start with a DangerousValue error to protect funds. This PR catches that error and retries the build with accept_stale_channel_monitors enabled, allowing ldk-node to accept the stale monitor and self-heal via commitment round-trips with the channel peer.

Unlike the approach in #854, this version always retries on DangerousValue result during ldk-node setup.

Also adds a 10s connection timeout to the node config, as required by ldk-node rc.33.

Preview

Screenshot of app after recovery.

QA Notes

1. Normal startup (unaffected users)

  1. Install the build on a device with an existing wallet
  2. Launch the app
  3. Verify the node starts normally with no warnings in logs
  4. Verify all balances and channels appear correctly

2. Stale monitor recovery

  1. Reproduce the stale monitor state (overwrite a channel monitor in VSS with an older update_id)
  2. Launch the app — first build fails with DangerousValue
  3. Verify the automatic retry succeeds and the node starts
  4. Check logs for "Build failed with DangerousValue. Retrying with accept_stale_channel_monitors"
  5. Verify "Stale monitor recovery: all monitors healed" appears in logs within ~15s
  6. Kill and relaunch — verify normal startup (no retry needed since monitors are now healed)

3. Connection timeout (optional)

  1. Verify the node respects the 10s connection timeout (observable in poor network conditions)

🤖 Generated with Claude Code

@ovitrif ovitrif changed the title feat: stale channel monitor recovery feat: stale channel monitors recovery Mar 18, 2026
@piotr-iohk
Copy link
Collaborator

piotr-iohk commented Mar 19, 2026

Tested:

  • restore broken 2.1.0 wallet into 2.1.2 ✅
  • update 2.1.0 with broken wallet to 2.1.2 ✅
  • update from 2.0.3 (wallet with gap) to 2.1.2 ✅
  • update 2.1.0 with healthy wallet to 2.1.2 (regression check)✅
  • 2.1.0 with broken wallet (advanced 150 blocks) -> update to 2.1.2 ✅

Each time wallet is operational, channel opened. Ln payments sent and received with success.

Copy link
Collaborator

@piotr-iohk piotr-iohk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested. LGTM 👌

@jvsena42
Copy link
Member

Agree with all Claude suggestions

@piotr-iohk
Copy link
Collaborator

piotr-iohk commented Mar 19, 2026

Tested again:

  • restore broken 2.1.0 wallet into 2.1.2 ✅
  • update 2.1.0 with broken wallet to 2.1.2 ✅
  • update from 2.0.3 (wallet with gap) to 2.1.2 ✅
  • update from 2.0.3 (wallet with gap) -> 2.1.1 -> 2.1.2 ✅
  • update 2.1.0 with healthy wallet to 2.1.2 (regression check) ✅
  • 2.1.0 with broken wallet (advanced 600 blocks) -> update to 2.1.2 ✅

Each time wallet is operational. Ln payments sent and received with success.

@ovitrif
Copy link
Collaborator Author

ovitrif commented Mar 19, 2026

⚠️ PR must be updated to depend on ldk-node rc.36

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: VSS ChannelMonitor desync causes unrecoverable "LDK Build error: Read failed" on wallet restore

4 participants