fix: prevent replication neighbor sync from blocking shutdown under active traffic#70
Merged
jacderida merged 1 commit intorc-2026.4.1from Apr 14, 2026
Conversation
…ctive traffic The neighbor sync loop ran `run_neighbor_sync_round()` outside of its `tokio::select!` block. When shutdown was cancelled mid-round, the task couldn't notice until the entire sync round completed — which involves multiple network round-trips to peers that may themselves be shutting down, causing extended blocking. Wrap the sync round in a `tokio::select!` with `shutdown.cancelled()` so in-progress operations are cancelled immediately when shutdown fires. Also add a 10-second timeout to the replication engine's task joins in `shutdown()` as defense in depth, matching the same pattern applied to `DhtNetworkManager::stop()`. Discovered during auto-upgrade testing on a 151-node testnet with active client uploads. The DHT shutdown fix (saorsa-core) resolved 98% of hangs; this fix resolved the remaining 2%. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
95d6dcf to
34366cf
Compare
Collaborator
Author
|
Addressed the same timeout-detach issue flagged by Greptile on the companion saorsa-core PR (saorsa-labs/saorsa-core#81). The replication engine's |
mickvandijke
approved these changes
Apr 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
run_neighbor_sync_round()outside itstokio::select!block, preventing shutdown cancellation from being noticed during long sync roundstokio::select!withshutdown.cancelled()so in-progress operations are cancelled immediatelyshutdown()as defense in depthContext
Discovered during auto-upgrade testing on a 100-node testnet with active client uploads. The companion fix in saorsa-core (saorsa-labs/saorsa-core#fix/dht-shutdown-hang-under-traffic) resolved the DHT shutdown hang for 98% of nodes. The remaining 2% were stuck at
engine.shutdown().awaitin the replication engine, where the neighbor sync task was mid-round when shutdown fired.Test plan
cargo checkcleanDepends on: saorsa-labs/saorsa-core PR for the DHT shutdown fix (same root cause pattern, different code layer)
🤖 Generated with Claude Code