Prevent HTLC double-forwards, prune forwarded onions by valentinewallace · Pull Request #4303 · lightningdevkit/rust-lightning

valentinewallace · 2026-01-06T22:02:32Z

Fixes two bugs where we would otherwise double-forward an HTLC
Prunes inbound HTLC onions once they are irrevocably forwarded (#4227 Followups #4280)
Prepares for LDK 0.5 where we can stop persisting legacy ChannelManager forward maps and fully reconstruct them from Channel data. See the follow-up tagged 0.5: (0.5) Deprecate legacy pending HTLC ChannelManager persistence #4359

Closes #4280, partially addresses #4286

~~Based on #4289~~

Reconstruct pending HTLC set from Channels in prod based on the serialized manager version Fix double-forward, prefer legacy forward maps #4289 (comment)
Fix double-forward if outbound HTLC is in holding cell (#4227 Followups #4280 (comment))

ldk-reviews-bot · 2026-01-06T22:02:35Z

👋 Thanks for assigning @joostjager as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

codecov · 2026-01-06T23:43:32Z

Codecov Report

❌ Patch coverage is 88.01020% with 47 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.02%. Comparing base (d555994) to head (18268f2).
⚠️ Report is 17 commits behind head on main.

Files with missing lines	Patch %	Lines
lightning/src/ln/channelmanager.rs	87.94%	23 Missing and 11 partials ⚠️
lightning/src/ln/channel.rs	88.63%	9 Missing and 1 partial ⚠️
lightning/src/util/ser.rs	40.00%	0 Missing and 3 partials ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##             main    #4303    +/-   ##
========================================
  Coverage   86.01%   86.02%            
========================================
  Files         156      156            
  Lines      102766   103103   +337     
  Branches   102766   103103   +337     
========================================
+ Hits        88396    88695   +299     
- Misses      11864    11897    +33     
- Partials     2506     2511     +5

Flag	Coverage Δ
tests	`86.02% <88.01%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lightning/src/ln/channel.rs

valentinewallace · 2026-01-14T17:19:52Z

Pushed a rebase on main due to #4289 landing

joostjager · 2026-01-29T18:22:22Z

Haven't reviewed yet, but the following question came up: in the previous pr and in this pr, double-forward bugs are fixed. Couldn't the fuzzer detect this?

joostjager

The logic in this PR is quite complex, with multiple interacting state machines (inbound HTLC state, outbound HTLC state, holding cells, monitor updates) that need to stay consistent across restarts. The fact that multiple double-forward bugs were discovered during development suggests the state space is large enough that targeted unit tests may not provide sufficient coverage.

I keep wondering if the existing fuzzer can exercise these restart scenarios with in-flight HTLCs at various stages. The current strategy of adding a specific regression test might be too reactive?

lightning/src/ln/channel.rs

lightning/src/ln/channelmanager.rs

lightning/src/ln/channel.rs

joostjager · 2026-01-30T10:36:55Z

lightning/src/ln/channel.rs

+	Forwarded {
+		/// Useful if we need to fail or claim this HTLC backwards after restart, if it's missing in the
+		/// outbound edge.
+		hop_data: HTLCPreviousHopData,


I think there is some data in here that is redundant with InboundHTLCOutput. Not sure if we care

I'm not sure either. @TheBlueMatt any opinion? This is the simplest way so I might save the dedup for followup.

Hmm, yea, would still be quite nice to reduce size here but its alright to do it in a followup (just has to happen in 0.3!)

@TheBlueMatt @joostjager I sketched this out: 9c8a92e

It adds a fair amount of code and only saves the htlc_id and cltv_expiry fields. Does that seem worth it? I'm not super convinced but I guess it could add up

Hm yes, I don't know if it is worth it. Note that in WithOnion there are also duplications. I think it could have been

WithOnion { onion_routing_packet: OnionPacket, skimmed_fee_msat: Option<u64>, blinding_point: Option<PublicKey>, },

Generally I like deduplicated data. No risk of an invalid state. But no strong opinion here.

Wait can't we also drop the channel_id, outpoint, counterparty_node_id, user_channel_id, and prev_outbound_scid_alias?

channel_id and prev_outbound_scid_alias are used in fail_htlc_backwards_internal. counteparty_node_id, user_channel_id, and outpoint are used in claim_fund_internal / claim_funds_from_hop.

lightning/src/ln/channelmanager.rs

lightning/src/ln/channel.rs

lightning/src/ln/channelmanager.rs

lightning/src/ln/channel.rs

lightning/src/ln/channelmanager.rs

lightning/src/ln/channel.rs

TheBlueMatt · 2026-01-30T21:55:57Z

lightning/src/ln/channel.rs

+	Forwarded {
+		/// Useful if we need to fail or claim this HTLC backwards after restart, if it's missing in the
+		/// outbound edge.
+		hop_data: HTLCPreviousHopData,


Hmm, yea, would still be quite nice to reduce size here but its alright to do it in a followup (just has to happen in 0.3!)

valentinewallace · 2026-02-02T19:59:06Z

Rebased since #4332 landed, haven't pushed updates addressing feedback yet

We recently added support for reconstructing ChannelManager::decode_update_add_htlcs on startup, using data present in the Channels. However, we failed to prune HTLCs from this rebuilt map if a given inbound HTLC was already forwarded to the outbound edge and in the outbound holding cell (this bug could've caused us to double-forward HTLCs, fortunately it never shipped). As part of fixing this bug, we clean up the overall pruning approach by: 1. If the Channel is open, then it is the source of truth for what HTLCs are outbound+pending (including pending in the holding cell) 2. If the Channel is closed, then the corresponding ChannelMonitor is the source of truth for what HTLCs are outbound+pending Previously, we would only consider the monitor's pending HTLCs, which ignored holding cell HTLCs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

This cleanup falls out of the changes made in the previous commit. Separated out here for reviewability.

We recently added support for reconstructing ChannelManager::decode_update_add_htlcs on startup, using data present in the Channels. However, we failed to prune HTLCs from this rebuilt map if a given HTLC was already forwarded+removed from the outbound edge and resolved in the inbound edge's holding cell. Here we fix this bug that would have caused us to double-forward inbound HTLC forwards, which fortunately was not shipped. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

valentinewallace · 2026-02-02T23:22:38Z

Addressed feedback: diff

valentinewallace · 2026-02-02T23:24:02Z

The logic in this PR is quite complex, with multiple interacting state machines (inbound HTLC state, outbound HTLC state, holding cells, monitor updates) that need to stay consistent across restarts. The fact that multiple double-forward bugs were discovered during development suggests the state space is large enough that targeted unit tests may not provide sufficient coverage.

I keep wondering if the existing fuzzer can exercise these restart scenarios with in-flight HTLCs at various stages. The current strategy of adding a specific regression test might be too reactive?

I agree on the fuzzing, it's a little tricky to add so I'm still working on it and will probably save it for follow-up.

valentinewallace · 2026-02-03T17:42:17Z

Pushed a fix for CI, also split out another commit: diff

valentinewallace · 2026-02-03T19:08:21Z

CI fix part 2: diff

In 0.3+, we are taking steps to remove the requirement of regularly persisting the ChannelManager and instead rebuild the set of HTLC forwards (and the manager generally) from Channel{Monitor} data. We previously merged support for reconstructing the ChannelManager::decode_update_add_htlcs map from channel data, using a new HTLC onion field that will be present for inbound HTLCs received on 0.3+ only. However, we now want to add support for pruning this field once it's no longer needed so it doesn't get persisted every time the manager gets persisted. At the same time, in a future LDK version we need to detect whether the field was ever present to begin with to prevent upgrading with legacy HTLCs present. We accomplish both by converting the plain update_add option that was previously serialized to an enum that can indicate whether the HTLC is from 0.2- versus 0.3+-with-onion-pruned (a variant for the latter is added in the next commit). Actual pruning of the new update_add field is added in the next commit.

We store inbound committed HTLCs' onions in Channels for use in reconstructing the pending HTLC set on ChannelManager read. If an HTLC has been forwarded to the outbound edge, we no longer need to persist the inbound edge's onion and can prune it here.

We recently merged (test-only, for now) support for the ChannelManager reconstructing its set of pending HTLCs from Channel{Monitor} data, rather than using its own persisted maps. But because we want test coverage of both the new reconstruction codepaths as well as the old persisted map codepaths, in tests we would decide between those two sets of codepaths randomly. We now want to add some tests that require using the new codepaths, so here we add an option to explicitly set whether to reconstruct or not rather than choosing randomly.

Cleans it up a bit in preparation for adding a new variant in the next commit.

joostjager

Logic appears sound. However... while I've reviewed the code multiple times, I'm still not confident that I (or anyone) can fully reason through every implication of these changes. The interaction between Channel state, ChannelMonitor state, holding cells, and the various reconstruction codepaths during deserialization has grown complex enough that it's hard to have high confidence that all edge cases are covered.

No criticism of the PR, but it raises a broader question. Has ChannelManager deserialization reached a complexity threshold where changes carry inherent risk that's hard to mitigate through review alone? Hopefully the fuzzing will be a good additional gating condition.

The PR is quite large also. Splitting it up between the bug fixes and the rest might have been helpful.

lightning/src/ln/channel.rs

joostjager · 2026-02-04T08:31:50Z

lightning/src/ln/channelmanager.rs

 		// to ensure the legacy codepaths also have test coverage.
 		#[cfg(not(test))]
-		let reconstruct_manager_from_monitors = false;
+		let reconstruct_manager_from_monitors = _version >= RECONSTRUCT_HTLCS_FROM_CHANS_VERSION;


Comment above no longer accurate.

Wasn't it the idea to now hit this condition in tests too?

Updated the comment. We do hit this condition in tests, see "Deterministic reconstruct_manager option in tests commit and the usage of the field.

joostjager · 2026-02-04T08:37:22Z

lightning/src/ln/channelmanager.rs

-						channel_id: monitor.channel_id(),
-						htlc_id: SentHTLCId::from_source(&htlc_source),
-					});
+					for (htlc_source, payment_hash) in monitor.get_onchain_failed_outbound_htlcs() {


This is now only running when channel is closed, where as previously it ran on a wider set of condtions. I guess ok, because onchain is only relevant when closed?

I guess ok, because onchain is only relevant when closed?

Yeah, exactly

joostjager · 2026-02-04T08:41:50Z

lightning/src/ln/channel.rs

-											// received on an old pre-0.0.123 version of LDK. In this case, the HTLC is
-											// required to be resolved prior to upgrading to 0.1+ per CHANGELOG.md.
-											update_add_htlc_opt: None,
+											// received on LDK 0.1-.


It's not easy to follow all the versions and upgrade states. Is this comment change indeed in line with commit message and InboundUpdateAdd::Legacy docs?

Yeah, the previous comment was just wrong :( Misread the changelog.

joostjager · 2026-02-04T08:46:23Z

lightning/src/ln/channelmanager.rs

+					},
+					Err(_) => {
+						let rand_val =
+							std::collections::hash_map::RandomState::new().build_hasher().finish();


It's a fact now, but I still fear that we'll get in trouble with this. I did a quick test run and chanmgr::read with random is only hit in 50 tests. Parameterization wouldn't be completely out of the question. Or running that limited lightning test set another time in ci-test.sh with the env flag set to rebuild.

TheBlueMatt

LGTM except for the last hunk in the second-to-last commit. Really have no idea what's going on there.

lightning/src/ln/channelmanager.rs

TheBlueMatt · 2026-02-04T15:45:20Z

lightning/src/ln/channelmanager.rs

+				for (hash, hop_data, outbound_amt_msat) in
+					mem::take(&mut already_forwarded_htlcs).drain(..)
+				{
+					if hash != payment_hash {


I'm really confused by this hunk. This loop is really just being used for inbound payments, not forwarded ones, so this seems like it should belong further up where we're creating pending_claims_to_replay. I'm not really sure why this is in the loop at all - it looks at the channel we're iterating over when scanning pending_claims_to_replay but not when looking at the already_forwarded_htlcs entry?

We're also keying by payment hash - skipping any failures that happen to have the same payment hash as a forwarded payment, possibly resulting in unrelated HTLCs (eg an MPP that got forwarded through us twice) getting forgotten (not that its critical to forget them, just that its weird).

Discussed quite a bit offline. I added a fixup converting already_forwarded_htlcs to a hashmap.

The conclusion of the discussion was, instead of the current approach, to: "when iterating through stored preimages, rather than using already_forwarded_htlcs, ask the channel for its list of [all] inbound htlcs and add the unclaimed ones in general to pending_claims_to_replay."

However, I think the only HTLCs we can easily add to pending_claims_to_replay are the ones that are already in already_forwarded_htlcs. All other types of inbound unclaimed HTLCs don't have an HTLCSource available. So the discussed approach seems like a decent amount of extra code to manually construct the HTLCSources, for what seem like unreachable codepaths (since if the monitor and the manager are that out-of-sync about HTLC state, then the manager is stale and we probably FC'd the channel anyway).

I think a big part of the complaint about this code hunk is that it wasn't accessing data that was specific to a channel, so that part should at least be fixed now with the hashmap update.

In a recent commit, we added support for pruning an inbound HTLC's persisted onion once the HTLC has been irrevocably forwarded to the outbound edge. Here, we add a check on startup that those inbound HTLCs were actually handled. Specifically, we check that the inbound HTLC is either (a) currently present in the outbound edge or (b) was removed via claim. If neither of those are true, we infer that the HTLC was removed from the outbound edge via fail and fail the inbound HTLC backwards.

In 0.3+, we are taking steps to remove the requirement of regularly persisting the ChannelManager and instead rebuild the set of HTLC forwards (and the manager generally) from Channel{Monitor} data. We previously merged support for reconstructing the ChannelManager::decode_update_add_htlcs map from channel data, using a new HTLC onion field that will be present for inbound HTLCs received on 0.3+ only. The plan is that in upcoming LDK versions, the manager will reconstruct this map and the other forward/claimable/pending HTLC maps will automatically repopulate themselves on the next call to process_pending_htlc_forwards. As such, once we're in a future version that reconstructs the pending HTLC set, we can stop persisting the legacy ChannelManager maps such as forward_htlcs, pending_intercepted_htlcs since they will never be used. For 0.3 to be compatible with this future version, in this commit we detect that the manager was last written on a version of LDK that doesn't persist the legacy maps. In that case, we don't try to read the old forwards map and run the new reconstruction logic only.

valentinewallace added this to the 0.3 milestone Jan 6, 2026

valentinewallace mentioned this pull request Jan 6, 2026

Reconstruct ChannelManager forwarded HTLCs maps from Channels #4227

Merged

3 tasks

valentinewallace force-pushed the 2025-12-reconstruct-fwds-followup-2 branch from 01aa4c0 to 4012864 Compare January 7, 2026 21:50

valentinewallace self-assigned this Jan 8, 2026

valentinewallace added this to Weekly Goals Jan 8, 2026

valentinewallace added the weekly goal Someone wants to land this this week label Jan 8, 2026

valentinewallace force-pushed the 2025-12-reconstruct-fwds-followup-2 branch from 4012864 to e2ea1ed Compare January 8, 2026 21:16

valentinewallace marked this pull request as ready for review January 8, 2026 21:16

ldk-reviews-bot requested a review from wpaulino January 8, 2026 21:16

valentinewallace removed the request for review from wpaulino January 8, 2026 21:16

TheBlueMatt reviewed Jan 11, 2026

View reviewed changes

lightning/src/ln/channel.rs Show resolved Hide resolved

valentinewallace force-pushed the 2025-12-reconstruct-fwds-followup-2 branch from e2ea1ed to c40741b Compare January 14, 2026 17:19

valentinewallace mentioned this pull request Jan 15, 2026

Stale ChannelManager causes channel force-closures #4286

Open

16 tasks

valentinewallace force-pushed the 2025-12-reconstruct-fwds-followup-2 branch from c40741b to 3f0f811 Compare January 29, 2026 16:09

valentinewallace changed the title ~~Follow-ups to #4227 (Part 2)~~ Prevent HTLC double-forwards, prune forwarded onions Jan 29, 2026

valentinewallace force-pushed the 2025-12-reconstruct-fwds-followup-2 branch from 3f0f811 to fce5071 Compare January 29, 2026 16:27

valentinewallace requested review from TheBlueMatt and joostjager January 29, 2026 16:28

This was referenced Jan 29, 2026

Split ChannelManager::read into two stages #4332

Merged

(0.5) Deprecate legacy pending HTLC ChannelManager persistence #4359

Draft

joostjager reviewed Jan 30, 2026

View reviewed changes

TheBlueMatt reviewed Jan 30, 2026

View reviewed changes

lightning/src/ln/channel.rs Outdated Show resolved Hide resolved

lightning/src/ln/channelmanager.rs Show resolved Hide resolved

TheBlueMatt reviewed Jan 30, 2026

View reviewed changes

valentinewallace force-pushed the 2025-12-reconstruct-fwds-followup-2 branch from fce5071 to 84093ca Compare February 2, 2026 19:58

Trivial: document some fields on MonitorRestoreUpdates

9f435c0

valentinewallace and others added 3 commits February 2, 2026 16:15

Simplify channel_closed check on manager read

559ef6f

This cleanup falls out of the changes made in the previous commit. Separated out here for reviewability.

valentinewallace force-pushed the 2025-12-reconstruct-fwds-followup-2 branch from 84093ca to 94815bd Compare February 2, 2026 23:21

valentinewallace force-pushed the 2025-12-reconstruct-fwds-followup-2 branch from 94815bd to 89600ea Compare February 3, 2026 17:41

valentinewallace requested review from TheBlueMatt and joostjager February 3, 2026 17:42

valentinewallace force-pushed the 2025-12-reconstruct-fwds-followup-2 branch from 89600ea to f0f269c Compare February 3, 2026 19:08

valentinewallace added 4 commits February 3, 2026 17:47

Trivially refactor reload_node macro

3d14c13

Cleans it up a bit in preparation for adding a new variant in the next commit.

valentinewallace force-pushed the 2025-12-reconstruct-fwds-followup-2 branch from f0f269c to 4e69514 Compare February 3, 2026 22:53

joostjager previously approved these changes Feb 4, 2026

View reviewed changes

TheBlueMatt reviewed Feb 4, 2026

View reviewed changes

valentinewallace added 3 commits February 4, 2026 17:38

f convert already_forwarded_htlcs to a hashmap

75fe86c

valentinewallace dismissed joostjager’s stale review via 18268f2 February 4, 2026 22:38

valentinewallace force-pushed the 2025-12-reconstruct-fwds-followup-2 branch from 4e69514 to 18268f2 Compare February 4, 2026 22:38

Conversation

valentinewallace commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ldk-reviews-bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

valentinewallace commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joostjager commented Jan 29, 2026

Uh oh!

joostjager left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

valentinewallace commented Feb 2, 2026

Uh oh!

valentinewallace commented Feb 2, 2026

Uh oh!

valentinewallace commented Feb 2, 2026

Uh oh!

valentinewallace commented Feb 3, 2026

Uh oh!

valentinewallace commented Feb 3, 2026

Uh oh!

joostjager left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

valentinewallace commented Jan 6, 2026 •

edited

Loading

ldk-reviews-bot commented Jan 6, 2026 •

edited

Loading

codecov bot commented Jan 6, 2026 •

edited

Loading

valentinewallace commented Jan 14, 2026 •

edited

Loading