Skip to content

Commit 34366cf

Browse files
jacderidaclaude
andcommitted
fix: prevent replication neighbor sync from blocking shutdown under active traffic
The neighbor sync loop ran `run_neighbor_sync_round()` outside of its `tokio::select!` block. When shutdown was cancelled mid-round, the task couldn't notice until the entire sync round completed — which involves multiple network round-trips to peers that may themselves be shutting down, causing extended blocking. Wrap the sync round in a `tokio::select!` with `shutdown.cancelled()` so in-progress operations are cancelled immediately when shutdown fires. Also add a 10-second timeout to the replication engine's task joins in `shutdown()` as defense in depth, matching the same pattern applied to `DhtNetworkManager::stop()`. Discovered during auto-upgrade testing on a 151-node testnet with active client uploads. The DHT shutdown fix (saorsa-core) resolved 98% of hangs; this fix resolved the remaining 2%. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 27aa635 commit 34366cf

1 file changed

Lines changed: 27 additions & 14 deletions

File tree

src/replication/mod.rs

Lines changed: 27 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -255,8 +255,16 @@ impl ReplicationEngine {
255255
/// released (e.g. before reopening the same LMDB environment).
256256
pub async fn shutdown(&mut self) {
257257
self.shutdown.cancel();
258-
for handle in self.task_handles.drain(..) {
259-
let _ = handle.await;
258+
for (i, mut handle) in self.task_handles.drain(..).enumerate() {
259+
match tokio::time::timeout(std::time::Duration::from_secs(10), &mut handle).await {
260+
Ok(Ok(())) => {}
261+
Ok(Err(e)) if e.is_cancelled() => {}
262+
Ok(Err(e)) => warn!("Replication task {i} panicked during shutdown: {e}"),
263+
Err(_) => {
264+
warn!("Replication task {i} did not stop within 10s, aborting");
265+
handle.abort();
266+
}
267+
}
260268
}
261269
}
262270

@@ -435,18 +443,23 @@ impl ReplicationEngine {
435443
debug!("Neighbor sync triggered by topology change");
436444
}
437445
}
438-
run_neighbor_sync_round(
439-
&p2p,
440-
&storage,
441-
&paid_list,
442-
&queues,
443-
&config,
444-
&sync_state,
445-
&sync_history,
446-
&is_bootstrapping,
447-
&bootstrap_state,
448-
)
449-
.await;
446+
// Wrap the sync round in a select so shutdown cancels
447+
// in-progress network operations rather than waiting for
448+
// the full round to complete.
449+
tokio::select! {
450+
() = shutdown.cancelled() => break,
451+
_ = run_neighbor_sync_round(
452+
&p2p,
453+
&storage,
454+
&paid_list,
455+
&queues,
456+
&config,
457+
&sync_state,
458+
&sync_history,
459+
&is_bootstrapping,
460+
&bootstrap_state,
461+
) => {}
462+
}
450463
}
451464
debug!("Neighbor sync loop shut down");
452465
});

0 commit comments

Comments
 (0)