feat(tunnel): pipelined polls with adaptive depth, wseq ordering, STUN blocking#1115
Merged
therealaleph merged 10 commits intoMay 16, 2026
Merged
Conversation
af703b1 to
5c010ed
Compare
…nt reads Three improvements to full-tunnel throughput and latency: 1. **Overlapped client reads**: tunnel_loop reads from the client socket concurrently with the batch reply wait via tokio::select!, buffering upload data for the next op instead of blocking on a fresh read timeout. 2. **Pipelined polls with seq echo**: add a per-op sequence number echoed by the tunnel-node so the client can reorder out-of-order replies. Sessions with sustained data flow (consecutive_data >= 2) ramp up to MAX_INFLIGHT_PER_SESSION polls in flight, with 1s stagger between sends so they land in separate batches. Drops to serial on first empty reply. 3. **Adaptive pipeline depth**: idle sessions stay at depth 1 (no extra polls). Data-bearing sessions gradually ramp 1→2→3→...→10. At most MAX_ELEVATED_PER_DEPLOYMENT (6) sessions per deployment can be elevated simultaneously, preventing semaphore exhaustion. Elevation slots are released immediately on first empty reply or session close. Wire protocol: BatchOp and TunnelResponse gain an optional `seq` field. Fully backward compatible — old tunnel-nodes ignore the field, new clients fall back to serial (depth 1) when resp.seq is None. Tunnel-node: LONGPOLL_DEADLINE reduced from 15s to 4s for faster poll turnaround while keeping persistent connections (Telegram) stable. Includes bench-pipeline.sh for comparing serial vs pipelined throughput. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…STUN blocking Pipeline improvements: - Optimist start at depth 2 (free, no permit), drop to 1 on 2 consecutive empties - Elevation permit only for depth 3+ with 32KB download threshold (prevents keep-alive sessions like Telegram from over-elevating) - Fast-path uploads bypass full pipeline with +4 cap and 20ms coalesce - Data-op preference: 20ms client read check before sending empty polls - 1s stagger always applied for batch separation - Client socket close breaks immediately (no waiting for in-flight polls) - consecutive_data no longer resets on single empties Android: - Pipeline debug overlay (SYSTEM_ALERT_WINDOW) with per-session tracking - Tokio worker threads 4 (was 2) to prevent burst stalls - STUN/TURN port blocking (3478/5349/19302) for instant WebRTC TCP fallback Tunnel-node: - LONGPOLL_DEADLINE 4s (must stay below client batch timeout) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tunnel-node: - Drain loop: keep reading until buffer empty (max 1s), accumulates up to 2MB+ per drain for streaming video (was 100KB) - Upload size logging for debugging - 512KB reader buffer (was 64KB) - LONGPOLL_DEADLINE 4s Client: - INFLIGHT_ACTIVE 4 (was 10) to prevent semaphore exhaustion - Upload loop-read in initial path (1s max, accumulates fat uploads) - Fast-path 200ms coalesce loop (was single 20ms read) - 32KB download threshold for elevation (prevents keep-alive sessions like Telegram from over-elevating) - consecutive_data no longer resets on single empties - block_stun config (default true) with Android UI toggle - 512KB client read buffer Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Architecture: - Upload task (spawned): reads client socket → sends MuxMsg::Data with wseq directly to mux → sends InflightEntry to download task. Fully independent, never blocked by downloads. - Download task (inline): processes replies, sends refill polls (timer), accepts InflightEntry. Never blocked by uploads. - Lock-free mpsc channels throughout — no Mutex contention. Write ordering (wseq): - Client assigns monotonic wseq to data-bearing ops only (not polls). - Tunnel-node buffers out-of-order writes per session, flushes in wseq order. Backward compatible: old clients without wseq write immediately. - Fixes data corruption from pipelined batches completing out of order. Upload accumulation: - Adaptive: 50ms initial window for small messages (low latency). - If >= 32KB accumulated, extend to 1s / 1MB cap (fat uploads for files). Other: - Removed consecutive_empty gate on refill (was killing idle sessions). - Tunnel-node reader buffer 2MB (was 512KB). - Removed legacy detection (was false-triggering on merged replies). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reduces upload chunk size to prevent large video uploads from starving heartbeat polls in shared batches. Adaptive accumulation: - 50ms initial window, 10ms per-read gap timeout - >= 8KB triggers extended 1s window (capped at 256KB) - Smaller chunks clear batches faster, heartbeats get through Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Mux channel unbounded (was 512) — prevents upload flood from blocking download task's poll sends and ack processing - Pipeline debug functions no-op'd — std::sync::Mutex was blocking tokio workers under contention during heavy uploads - Upload accumulation yields between reads - Added batch response mismatch logging (r.len vs sent ops) - Open issue: r.len()=0 from Apps Script during heavy uploads Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Upload semaphore: max 3 unacked data ops per session (TCP-like flow control). Permit held in inflight future until reply arrives. - Suppress refill polls while data ops are in flight — prevents upload acks from being delayed behind slow poll responses in pending_writes. - data_ops_in_flight counter tracks active upload ops per session. - upload_cap config field (default 3, not yet wired to Android UI). Root cause of video upload stall: r.len()=0 batch responses from Apps Script when batches are large (19+ ops). Needs Apps Script investigation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The split upload/download task architecture caused video upload stalls: upload ack responses were delayed behind slow poll responses in the pending_writes ordering buffer. The single-loop naturally serializes uploads with reply processing, giving steady ack delivery. Single-loop keeps all pipelining benefits (elevated polls, adaptive depth, fast-path uploads) while avoiding the ordering issue. Removed dead upload_cap config field. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5c010ed to
377add3
Compare
7 tasks
therealaleph
approved these changes
May 16, 2026
Owner
therealaleph
left a comment
There was a problem hiding this comment.
Verified locally before merge:
- cargo test --lib (240 passed)
- cargo test --manifest-path tunnel-node/Cargo.toml (38 passed)
- cargo build --release
- cargo build --manifest-path tunnel-node/Cargo.toml --release
- cargo build --bin mhrv-rs-ui --release --features ui
- ANDROID_HOME=
/Library/Android/sdk ANDROID_SDK_ROOT=/Library/Android/sdk JAVA_HOME=/Applications/Android\ Studio.app/Contents/jbr/Contents/Home ./gradlew :app:compileDebugKotlin
Also pushed a small maintainer fix to the PR branch for the test-only SessionInner initializers, the capped TCP drain loop, and desktop block_stun config/UI wiring.
Answered via LLM, Supervised @therealaleph
therealaleph
added a commit
that referenced
this pull request
May 16, 2026
Ship PR #1115 from @yyoyoian-pixel: adaptive pipelined Full-mode polls, wseq-ordered tunnel-node writes, default STUN/TURN UDP blocking for faster WebRTC TCP fallback, and Android/desktop config support for the new block_stun path. Local release gates passed on macOS: cargo test --lib, tunnel-node tests, cargo build --release, tunnel-node release build, desktop UI release build, and Android compileDebugKotlin with Android Studio JBR and the local SDK.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Pipelined full-tunnel with adaptive pipeline depth, write-sequence ordering on the tunnel-node, and WebRTC TCP fallback.
Pipelining
Write ordering (wseq)
wseqto data-bearing ops (not polls)wseqwrite immediatelySTUN/TURN blocking
block_stunconfig (default true) with Android UI toggleTunnel-node improvements
Android
block_stuntoggle in Advanced settingsOther
consecutive_emptygate removed from refill (was killing idle sessions)Files changed
src/tunnel_client.rs— pipelining, fast-path, wseq, timer refill, single-loopsrc/domain_fronter.rs— wseq field on BatchOpsrc/proxy_server.rs— STUN blockingsrc/config.rs— block_stun configsrc/android_jni.rs— pipelineDebugJson JNI, worker_threads=4tunnel-node/src/main.rs— wseq ordering, 2MB reader, drain loop, LONGPOLL 4sandroid/— ConfigStore, HomeScreen, PipelineDebugOverlay, MhrvVpnService, Native, ManifestTest plan
🤖 Generated with Claude Code