Skip to content

feat(tunnel): pipelined polls with adaptive depth, wseq ordering, STUN blocking#1115

Merged
therealaleph merged 10 commits into
therealaleph:mainfrom
yyoyoian-pixel:feat/pipeline-tunnel-polls
May 16, 2026
Merged

feat(tunnel): pipelined polls with adaptive depth, wseq ordering, STUN blocking#1115
therealaleph merged 10 commits into
therealaleph:mainfrom
yyoyoian-pixel:feat/pipeline-tunnel-polls

Conversation

@yyoyoian-pixel
Copy link
Copy Markdown
Contributor

@yyoyoian-pixel yyoyoian-pixel commented May 13, 2026

Summary

Pipelined full-tunnel with adaptive pipeline depth, write-sequence ordering on the tunnel-node, and WebRTC TCP fallback.

Pipelining

  • Optimist start at depth 2 — every session begins with 2 in-flight polls (free, no elevation permit)
  • Adaptive ramp to depth 4 — sessions with sustained download data (>32KB) elevate with permit
  • Fast-path uploads — when pipeline is full, upload data bypasses depth cap (+4 extra ops)
  • Timer-based refill — non-blocking 100ms steps in the select loop, polls after 1s
  • Single-loop architecture — upload reads and reply processing in one select loop for natural back-pressure

Write ordering (wseq)

  • Client assigns monotonic wseq to data-bearing ops (not polls)
  • Tunnel-node buffers out-of-order writes per session, flushes in wseq order
  • Backward compatible: old clients without wseq write immediately
  • Prevents TLS corruption from pipelined batches completing out of order

STUN/TURN blocking

  • block_stun config (default true) with Android UI toggle
  • Rejects STUN/TURN ports (3478/5349/19302) so WebRTC apps (Meet, WhatsApp) instantly fall back to TCP TURN
  • Eliminates 10-30s ICE negotiation timeout

Tunnel-node improvements

  • LONGPOLL_DEADLINE 4s (must stay below client batch timeout)
  • Reader buffer 2MB (was 64KB)
  • Drain loop: keeps reading until buffer empty (max 1s), accumulates up to 2MB+ per drain
  • Upload size logging

Android

  • Pipeline debug overlay (SYSTEM_ALERT_WINDOW) — temporary, shows session depths and events
  • Tokio worker threads: 4 (was 2)
  • block_stun toggle in Advanced settings

Other

  • Legacy detection removed (was false-triggering)
  • consecutive_empty gate removed from refill (was killing idle sessions)
  • 32KB download threshold for elevation (prevents keep-alive sessions from over-elevating)
  • Unbounded mux channel (prevents upload flood from blocking downloads)

Files changed

  • src/tunnel_client.rs — pipelining, fast-path, wseq, timer refill, single-loop
  • src/domain_fronter.rs — wseq field on BatchOp
  • src/proxy_server.rs — STUN blocking
  • src/config.rs — block_stun config
  • src/android_jni.rs — pipelineDebugJson JNI, worker_threads=4
  • tunnel-node/src/main.rs — wseq ordering, 2MB reader, drain loop, LONGPOLL 4s
  • android/ — ConfigStore, HomeScreen, PipelineDebugOverlay, MhrvVpnService, Native, Manifest

Test plan

  • Pipelining: sessions ramp 2→3→4, downloads overlap
  • Fast-path: uploads bypass full pipeline
  • wseq ordering: tunnel-node logs show in-order writes
  • STUN blocking: Google Meet connects instantly via TCP TURN
  • Video upload: starts immediately, no stall (single-loop)
  • Telegram messaging: messages send with expected delay
  • Debug overlay: shows sessions, depth, events
  • Long-running stability test

🤖 Generated with Claude Code

@github-actions github-actions Bot added the type: feature feat: PR — auto-applied by release-drafter label May 13, 2026
@yyoyoian-pixel yyoyoian-pixel changed the title feat(tunnel): pipelined polls with adaptive depth feat(tunnel): pipelined polls with adaptive depth, wseq ordering, STUN blocking May 14, 2026
@yyoyoian-pixel yyoyoian-pixel force-pushed the feat/pipeline-tunnel-polls branch 2 times, most recently from af703b1 to 5c010ed Compare May 14, 2026 21:58
yyoyoian-pixel and others added 9 commits May 15, 2026 00:01
…nt reads

Three improvements to full-tunnel throughput and latency:

1. **Overlapped client reads**: tunnel_loop reads from the client socket
   concurrently with the batch reply wait via tokio::select!, buffering
   upload data for the next op instead of blocking on a fresh read timeout.

2. **Pipelined polls with seq echo**: add a per-op sequence number echoed
   by the tunnel-node so the client can reorder out-of-order replies.
   Sessions with sustained data flow (consecutive_data >= 2) ramp up to
   MAX_INFLIGHT_PER_SESSION polls in flight, with 1s stagger between sends
   so they land in separate batches. Drops to serial on first empty reply.

3. **Adaptive pipeline depth**: idle sessions stay at depth 1 (no extra
   polls). Data-bearing sessions gradually ramp 1→2→3→...→10. At most
   MAX_ELEVATED_PER_DEPLOYMENT (6) sessions per deployment can be elevated
   simultaneously, preventing semaphore exhaustion. Elevation slots are
   released immediately on first empty reply or session close.

Wire protocol: BatchOp and TunnelResponse gain an optional `seq` field.
Fully backward compatible — old tunnel-nodes ignore the field, new clients
fall back to serial (depth 1) when resp.seq is None.

Tunnel-node: LONGPOLL_DEADLINE reduced from 15s to 4s for faster poll
turnaround while keeping persistent connections (Telegram) stable.

Includes bench-pipeline.sh for comparing serial vs pipelined throughput.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…STUN blocking

Pipeline improvements:
- Optimist start at depth 2 (free, no permit), drop to 1 on 2 consecutive empties
- Elevation permit only for depth 3+ with 32KB download threshold (prevents
  keep-alive sessions like Telegram from over-elevating)
- Fast-path uploads bypass full pipeline with +4 cap and 20ms coalesce
- Data-op preference: 20ms client read check before sending empty polls
- 1s stagger always applied for batch separation
- Client socket close breaks immediately (no waiting for in-flight polls)
- consecutive_data no longer resets on single empties

Android:
- Pipeline debug overlay (SYSTEM_ALERT_WINDOW) with per-session tracking
- Tokio worker threads 4 (was 2) to prevent burst stalls
- STUN/TURN port blocking (3478/5349/19302) for instant WebRTC TCP fallback

Tunnel-node:
- LONGPOLL_DEADLINE 4s (must stay below client batch timeout)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tunnel-node:
- Drain loop: keep reading until buffer empty (max 1s), accumulates
  up to 2MB+ per drain for streaming video (was 100KB)
- Upload size logging for debugging
- 512KB reader buffer (was 64KB)
- LONGPOLL_DEADLINE 4s

Client:
- INFLIGHT_ACTIVE 4 (was 10) to prevent semaphore exhaustion
- Upload loop-read in initial path (1s max, accumulates fat uploads)
- Fast-path 200ms coalesce loop (was single 20ms read)
- 32KB download threshold for elevation (prevents keep-alive sessions
  like Telegram from over-elevating)
- consecutive_data no longer resets on single empties
- block_stun config (default true) with Android UI toggle
- 512KB client read buffer

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Architecture:
- Upload task (spawned): reads client socket → sends MuxMsg::Data with
  wseq directly to mux → sends InflightEntry to download task. Fully
  independent, never blocked by downloads.
- Download task (inline): processes replies, sends refill polls (timer),
  accepts InflightEntry. Never blocked by uploads.
- Lock-free mpsc channels throughout — no Mutex contention.

Write ordering (wseq):
- Client assigns monotonic wseq to data-bearing ops only (not polls).
- Tunnel-node buffers out-of-order writes per session, flushes in wseq
  order. Backward compatible: old clients without wseq write immediately.
- Fixes data corruption from pipelined batches completing out of order.

Upload accumulation:
- Adaptive: 50ms initial window for small messages (low latency).
- If >= 32KB accumulated, extend to 1s / 1MB cap (fat uploads for files).

Other:
- Removed consecutive_empty gate on refill (was killing idle sessions).
- Tunnel-node reader buffer 2MB (was 512KB).
- Removed legacy detection (was false-triggering on merged replies).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reduces upload chunk size to prevent large video uploads from starving
heartbeat polls in shared batches. Adaptive accumulation:
- 50ms initial window, 10ms per-read gap timeout
- >= 8KB triggers extended 1s window (capped at 256KB)
- Smaller chunks clear batches faster, heartbeats get through

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Mux channel unbounded (was 512) — prevents upload flood from blocking
  download task's poll sends and ack processing
- Pipeline debug functions no-op'd — std::sync::Mutex was blocking tokio
  workers under contention during heavy uploads
- Upload accumulation yields between reads
- Added batch response mismatch logging (r.len vs sent ops)
- Open issue: r.len()=0 from Apps Script during heavy uploads

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Upload semaphore: max 3 unacked data ops per session (TCP-like flow
  control). Permit held in inflight future until reply arrives.
- Suppress refill polls while data ops are in flight — prevents upload
  acks from being delayed behind slow poll responses in pending_writes.
- data_ops_in_flight counter tracks active upload ops per session.
- upload_cap config field (default 3, not yet wired to Android UI).

Root cause of video upload stall: r.len()=0 batch responses from Apps
Script when batches are large (19+ ops). Needs Apps Script investigation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The split upload/download task architecture caused video upload stalls:
upload ack responses were delayed behind slow poll responses in the
pending_writes ordering buffer. The single-loop naturally serializes
uploads with reply processing, giving steady ack delivery.

Single-loop keeps all pipelining benefits (elevated polls, adaptive
depth, fast-path uploads) while avoiding the ordering issue.

Removed dead upload_cap config field.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@yyoyoian-pixel yyoyoian-pixel force-pushed the feat/pipeline-tunnel-polls branch from 5c010ed to 377add3 Compare May 14, 2026 22:01
@yyoyoian-pixel yyoyoian-pixel marked this pull request as ready for review May 15, 2026 16:58
Copy link
Copy Markdown
Owner

@therealaleph therealaleph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified locally before merge:

  • cargo test --lib (240 passed)
  • cargo test --manifest-path tunnel-node/Cargo.toml (38 passed)
  • cargo build --release
  • cargo build --manifest-path tunnel-node/Cargo.toml --release
  • cargo build --bin mhrv-rs-ui --release --features ui
  • ANDROID_HOME=/Library/Android/sdk ANDROID_SDK_ROOT=/Library/Android/sdk JAVA_HOME=/Applications/Android\ Studio.app/Contents/jbr/Contents/Home ./gradlew :app:compileDebugKotlin

Also pushed a small maintainer fix to the PR branch for the test-only SessionInner initializers, the capped TCP drain loop, and desktop block_stun config/UI wiring.


Answered via LLM, Supervised @therealaleph

@therealaleph therealaleph merged commit 919b13b into therealaleph:main May 16, 2026
1 check passed
therealaleph added a commit that referenced this pull request May 16, 2026
Ship PR #1115 from @yyoyoian-pixel: adaptive pipelined Full-mode polls, wseq-ordered tunnel-node writes, default STUN/TURN UDP blocking for faster WebRTC TCP fallback, and Android/desktop config support for the new block_stun path.

Local release gates passed on macOS: cargo test --lib, tunnel-node tests, cargo build --release, tunnel-node release build, desktop UI release build, and Android compileDebugKotlin with Android Studio JBR and the local SDK.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: feature feat: PR — auto-applied by release-drafter

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants