Skip to content

feat(sandbox): log connection attempts that bypass proxy path#326

Merged
johntmyers merged 6 commits intomainfrom
feat/268-bypass-detection/jomyers
Mar 16, 2026
Merged

feat(sandbox): log connection attempts that bypass proxy path#326
johntmyers merged 6 commits intomainfrom
feat/268-bypass-detection/jomyers

Conversation

@johntmyers
Copy link
Collaborator

🏗️ build-from-issue-agent

Summary

Add iptables LOG + REJECT rules inside the sandbox network namespace to detect and diagnose direct connection attempts that bypass the HTTP CONNECT proxy. Applications now get immediate ECONNREFUSED instead of a 30-second timeout, and a /dev/kmsg monitor emits structured BYPASS_DETECT tracing events with destination, protocol, process identity, and actionable hints.

Related Issue

Closes #268

Changes

  • crates/openshell-sandbox/src/sandbox/linux/netns.rs: Add install_bypass_rules() method with iptables OUTPUT chain rules (ACCEPT proxy, ACCEPT loopback, ACCEPT established, LOG+REJECT TCP, LOG+REJECT UDP). Includes run_iptables_netns() helper and iptables_available() check. IPv6 rules mirrored via ip6tables.
  • crates/openshell-sandbox/src/bypass_monitor.rs (NEW): Background /dev/kmsg reader that parses iptables LOG lines, resolves process identity via procfs::resolve_tcp_peer_identity(), emits structured tracing::warn!() events, and feeds DenialEvent with denial_stage: "bypass" to the denial aggregator.
  • crates/openshell-sandbox/src/lib.rs: Register bypass_monitor module. Call install_bypass_rules() after namespace creation. Clone denial channel sender for the monitor. Spawn bypass monitor after proxy startup.
  • crates/openshell-sandbox/src/denial_aggregator.rs: Updated denial_stage documentation to include "bypass".
  • examples/bring-your-own-container/Dockerfile: Add iptables to system packages.
  • architecture/sandbox.md: Added bypass detection section with rules, monitor lifecycle, event format, and graceful degradation.
  • architecture/sandbox-custom-containers.md: Documented iptables as optional dependency.

Deviations from Plan

None — implemented as planned.

External Dependency

The sandbox base image in the openshell-community repo also needs iptables installed. This was handled in NVIDIA/OpenShell-Community#36 (merged).

Testing

  • mise run pre-commit passes
  • Unit tests added/updated

Tests added:

  • Unit: crates/openshell-sandbox/src/bypass_monitor.rs — 11 tests covering kmsg line parsing (TCP, UDP, IPv6, missing fields, wrong namespace, unrelated messages), field extraction, and protocol-specific hint generation
  • Integration: crates/openshell-sandbox/src/sandbox/linux/netns.rs — existing namespace test preserved; iptables rule tests require root (#[ignore])
  • E2E: N/A — no changes under e2e/

Checklist

  • Follows Conventional Commits
  • Architecture docs updated
  • Graceful degradation for missing iptables or /dev/kmsg
  • Userland cannot modify iptables rules (verified: setuid to sandbox user clears CAP_NET_ADMIN)

Documentation updated:

  • architecture/sandbox.md: Added bypass detection section
  • architecture/sandbox-custom-containers.md: Added iptables as optional dependency

Add iptables LOG + REJECT rules inside the sandbox network namespace to
detect and diagnose direct connection attempts that bypass the HTTP
CONNECT proxy. This provides two improvements:

1. Fast-fail UX: applications get immediate ECONNREFUSED instead of a
   30-second timeout when they bypass the proxy
2. Diagnostics: a /dev/kmsg monitor emits structured BYPASS_DETECT
   tracing events with destination, protocol, process identity, and
   actionable hints

Both TCP and UDP bypass attempts are covered (UDP catches DNS bypass).
The feature degrades gracefully if iptables or /dev/kmsg are unavailable.

Closes #268
The sandbox base image runs Python 3.13. A stale venv on 3.12 causes
all exec_python E2E tests to fail because cloudpickle bytecode is not
compatible across minor versions.
The fast deploy's helm upgrade was missing the hostGatewayIP value that
the bootstrap entrypoint injects into the HelmChart CR. This caused
host.openshell.internal hostAliases to be lost from the gateway pod
and sandbox pods after any fast deploy, breaking host gateway routing.

Read the IP from the HelmChart CR and pass it through to helm upgrade.
The Drop impl for NetworkNamespace was accidentally deleted during the
bypass detection refactor, which would cause network namespaces and
veth interfaces to leak on every sandbox shutdown.

Also removes dead kmsg volume/mount code (bypass monitor uses dmesg
instead of direct /dev/kmsg access) and removes an accidentally
committed session transcript file.
@johntmyers johntmyers requested a review from pimlock March 16, 2026 06:14
@johntmyers
Copy link
Collaborator Author

xt_LOG is not available on Docker Desktop for Mac so the logs don't come through. but immediate blocking via iptables is working.

@johntmyers johntmyers merged commit d84a8f7 into main Mar 16, 2026
9 checks passed
@johntmyers johntmyers deleted the feat/268-bypass-detection/jomyers branch March 16, 2026 07:32
drew pushed a commit that referenced this pull request Mar 16, 2026
* feat(sandbox): log connection attempts that bypass proxy path

Add iptables LOG + REJECT rules inside the sandbox network namespace to
detect and diagnose direct connection attempts that bypass the HTTP
CONNECT proxy. This provides two improvements:

1. Fast-fail UX: applications get immediate ECONNREFUSED instead of a
   30-second timeout when they bypass the proxy
2. Diagnostics: a /dev/kmsg monitor emits structured BYPASS_DETECT
   tracing events with destination, protocol, process identity, and
   actionable hints

Both TCP and UDP bypass attempts are covered (UDP catches DNS bypass).
The feature degrades gracefully if iptables or /dev/kmsg are unavailable.

Closes #268

* chore: track .python-version to pin Python 3.13.12 for uv

The sandbox base image runs Python 3.13. A stale venv on 3.12 causes
all exec_python E2E tests to fail because cloudpickle bytecode is not
compatible across minor versions.

* fix(cluster): preserve hostGatewayIP across fast deploys

The fast deploy's helm upgrade was missing the hostGatewayIP value that
the bootstrap entrypoint injects into the HelmChart CR. This caused
host.openshell.internal hostAliases to be lost from the gateway pod
and sandbox pods after any fast deploy, breaking host gateway routing.

Read the IP from the HelmChart CR and pass it through to helm upgrade.

* wip: fix iptables path resolution, use dmesg for kmsg, add CAP_SYSLOG

* fix(sandbox): restore NetworkNamespace Drop impl, remove dead kmsg code

The Drop impl for NetworkNamespace was accidentally deleted during the
bypass detection refactor, which would cause network namespaces and
veth interfaces to leak on every sandbox shutdown.

Also removes dead kmsg volume/mount code (bypass monitor uses dmesg
instead of direct /dev/kmsg access) and removes an accidentally
committed session transcript file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(sandbox): log connection attempts that bypass proxy path

2 participants