0.2.2 accept stale monitors by ovitrif · Pull Request #1 · synonymdev/rust-lightning

ovitrif · 2026-03-19T19:06:04Z

Summary

Adds accept_stale_channel_monitors to ChannelManagerReadArgs, allowing nodes to recover from monitor desync (e.g. a migration or backup restore that overwrote a newer monitor with older data). When enabled, stale monitors have their update_id force-synced to the ChannelManager's expected value and their BOLT-3 commitment secrets tree reset, so the channel can self-heal on the next commitment round-trip.

Changes

1. `accept_stale_channel_monitors` field (channelmanager.rs)

New bool field on ChannelManagerReadArgs (default: false). When a monitor's update_id is behind the ChannelManager's, instead of returning DecodeError::DangerousValue, the deserialization path:

Logs a warning with both update_id values
Calls monitor.force_set_latest_update_id(target_id) to resync

2. `force_set_latest_update_id` with secrets tree reset (channelmonitor.rs)

New public method that:

Sets latest_update_id to the given value (skipping sequential validation)
Resets commitment_secrets to CounterpartyCommitmentSecrets::new() — clearing stale secrets that would fail cross-tree BOLT-3 derivation validation against new secrets from healing round-trips

3. Sentinel skip in `provide_secret` (chan_utils.rs)

Skip uninitialized sentinel slots (old_idx == 1 << 48) in the cross-check loop. Without this, ~50% of channels crash on the first post-reset provide_secret call — when the first secret index lands at position > 0, derive_secret checks against the all-zeros sentinel and always fails.

4. Test scaffolding (functional_test_utils.rs, reload_tests.rs)

Added accept_stale_channel_monitors: false to existing ChannelManagerReadArgs construction sites to compile with the new field.

Security analysis

Resetting the secrets tree

Concern	Assessment
Loss of penalty capability for pre-stale revoked commitments	The stale monitor lacks commitment transaction data for the gap period — it cannot construct penalty transactions even with the old secrets. The secrets alone provide no actionable security benefit.
Pre-stale commitments where monitor has HTLC data	Monitor loses secrets for these. However, the alternative (keeping stale secrets) causes a crash making the node completely non-functional, which is strictly worse.
Post-recovery exposure	After recovery, the channel should be closed as soon as practical to minimize the window where old secrets are unavailable.

Sentinel skip in `provide_secret`

Concern	Assessment
Could a normal channel hit the sentinel skip?	No. During sequential provision, sentinel slots at position P are always overwritten before position P+1 is first checked — the binary counting pattern guarantees this. The skip never triggers for non-stale channels.
Could an attacker craft index `1 << 48`?	No. Valid indices are `0` to `(1<<48)-1`. The commitment number determines the index via `(1<<48) - 1 - C`, and C starts at 0. Even at 1 commitment/second it takes ~9 million years to exhaust the space.
Are real cross-checks weakened?	No. Cross-checks between real (non-sentinel) slots remain fully enforced. The skip only applies to slots with the impossible sentinel index `1 << 48`.
When does the skip trigger?	Only after an explicit `CounterpartyCommitmentSecrets::new()` reset in `force_set_latest_update_id`.

Test plan

cargo build --all-targets — compiles without errors
cargo test --lib -p lightning — unit tests pass (includes existing CounterpartyCommitmentSecrets tests)
cargo fmt --check — no formatting issues in changed files
Manual test in bitkit app via ldk-node: restore a stale monitor, send a healing payment, verify no crash

Add a flag to ChannelManagerReadArgs that allows stale channel monitors to be accepted on startup instead of returning DangerousValue. When a stale monitor is detected, its update_id is force-synced to match the ChannelManager. The monitor's commitment state remains stale until the next real channel update (e.g. a commitment round-trip triggered by a fee update or probe). This enables recovery after monitor data was overwritten by a migration or backup restore. The caller should trigger a commitment round-trip after startup to heal the monitor state and recover revocation secrets via the derivation tree. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When a stale monitor is restored via accept_stale_channel_monitors and force_set_latest_update_id, the CounterpartyCommitmentSecrets tree retains secrets from before the update_id gap. When a healing payment triggers commitment round-trips, provide_secret() fails because the stale secrets occupy a different subtree in the BOLT-3 derivation tree than the new secrets. This causes update_monitor() -> Err -> chainmonitor mode mismatch -> Rust panic / SIGABRT. Two coordinated changes fix this: 1. channelmonitor.rs: Reset commitment_secrets to new() in force_set_latest_update_id so new secrets build a fresh, consistent derivation tree instead of failing cross-tree validation. 2. chan_utils.rs: Skip uninitialized sentinel slots (old_idx == 1 << 48) in provide_secret's cross-check loop. Without this, ~50% of channels crash on the first post-reset provide_secret call (when the first index lands at position > 0 and derive_secret checks against the all-zeros sentinel, which always fails). Security analysis — resetting the secrets tree: - Losing old secrets means we cannot build penalty transactions for revoked commitments from before the stale point. - However, the stale monitor lacks commitment transaction data for the gap period — it cannot construct penalty transactions even WITH the old secrets. The secrets alone provide no actionable security benefit. - For pre-stale commitments, the monitor has HTLC data but loses secrets. The alternative (keeping stale secrets) causes a crash that makes the node completely non-functional, which is strictly worse. - After recovery, the channel should be closed as soon as practical to minimize exposure during the window where old secrets are lost. Security analysis — sentinel skip in provide_secret: - 1 << 48 is outside the valid index range (0 to (1<<48)-1) and uniquely identifies uninitialized slots. No real secret can ever have this index. - During normal sequential provision (new channels), sentinel slots at position P are always overwritten before position P+1 is first checked — the binary counting pattern guarantees this. The sentinel skip never triggers for non-stale channels. - Cross-checks between real (non-sentinel) slots remain fully enforced. - The only way to reach the sentinel skip is after an explicit reset via force_set_latest_update_id (our stale recovery path). - Even at 1 commitment/second, it takes ~9 million years to exhaust the index space — it never wraps to 1 << 48. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ben-kaufman and others added 2 commits March 18, 2026 21:01

ovitrif self-assigned this Mar 19, 2026

ovitrif requested a review from ben-kaufman March 19, 2026 19:48

ovitrif mentioned this pull request Mar 19, 2026

feat: accept stale channel monitors for recovery synonymdev/ldk-node#76

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.2.2 accept stale monitors#1

0.2.2 accept stale monitors#1
ovitrif wants to merge 2 commits into0.2.2from
0.2.2-accept-stale-monitors

ovitrif commented Mar 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ovitrif commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

1. accept_stale_channel_monitors field (channelmanager.rs)

2. force_set_latest_update_id with secrets tree reset (channelmonitor.rs)

3. Sentinel skip in provide_secret (chan_utils.rs)

4. Test scaffolding (functional_test_utils.rs, reload_tests.rs)

Security analysis

Resetting the secrets tree

Sentinel skip in provide_secret

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ovitrif commented Mar 19, 2026 •

edited

Loading

1. `accept_stale_channel_monitors` field (channelmanager.rs)

2. `force_set_latest_update_id` with secrets tree reset (channelmonitor.rs)

3. Sentinel skip in `provide_secret` (chan_utils.rs)

Sentinel skip in `provide_secret`