Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions development/satellite/architecture.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -351,7 +351,7 @@ See [Log Capture](/development/satellite/log-capture) for buffering implementati

**Resource Management (stdio processes):**
- **nsjail Isolation**: PID, network, filesystem isolation in production
- **Resource Quotas**: 512MB physical RAM (cgroup), 2048MB virtual (rlimit), 60s CPU time, 1000 processes
- **Resource Quotas**: virtual RAM unlimited (rlimit_as=inf), 512MB physical RAM via cgroup when enabled, 60s CPU time, 1000 processes
- **Development Mode**: Direct spawn() without isolation for cross-platform development

**Authentication & Authorization:**
Expand All @@ -375,10 +375,10 @@ See [Team Isolation](/development/satellite/team-isolation) for complete impleme
- **Instance Isolation**: Processes tracked by `team_id` AND `user_id` with independent lifecycles
- **ProcessId Format**: `{server_slug}-{team_slug}-{user_slug}-{installation_id}`
- **Tool Discovery**: Automatic tool caching with per-user namespacing
- **Resource Limits**: nsjail in production with cgroup enforcement
- 512MB physical memory (cgroup_mem_max) - Precise control over actual memory usage
- 2048MB virtual memory (rlimit_as) - Fallback for systems without cgroup support
- 1000 processes (cgroup_pids_max) - Adequate for package managers
- **Resource Limits**: nsjail in production with auto-detected cgroup enforcement
- Virtual memory unlimited (rlimit_as=inf) — Node.js WASM requires ~10GB virtual address space
- 512MB physical memory (cgroup_mem_max) — active only when systemd `Delegate=yes` is configured
- 1000 processes (cgroup_pids_max + rlimit_nproc) — adequate for package managers
- 60s CPU time limit
- Runtime-specific cache directories: `/mcp-cache/node/{team_id}`, `/mcp-cache/python/{team_id}`
- **Development Mode**: Plain spawn() on all platforms for easy debugging
Expand Down Expand Up @@ -420,12 +420,12 @@ Configuration → Spawn → Monitor → Health Check → Restart/Terminate
- **Error Handling**: Complete HTTP status code mapping

### Current Resource Isolation Specifications
- **Physical Memory Limit**: 512MB RAM per MCP server process via cgroup (precise enforcement)
- **Virtual Memory Limit**: 2048MB RAM via rlimit (fallback for non-cgroup systems)
- **Virtual Memory Limit**: unlimited (rlimit_as=inf) — Node.js v24 WASM (undici HTTP parser) reserves ~10GB virtual address space; this is virtual, not physical RAM
- **Physical Memory Limit**: 512MB per MCP server process via cgroup — active only when satellite runs as a systemd service with `Delegate=yes`; falls back to rlimit-only otherwise
- **CPU Limit**: 60s CPU time limit
- **Process Limit**: 1000 processes per MCP server (accommodates package managers like npm, uvx)
- **Process Timeout**: 3-minute idle timeout for automatic cleanup
- **Isolation Method**: nsjail with Linux namespaces (PID, mount, UTS, IPC) and cgroup enforcement
- **Isolation Method**: nsjail with Linux namespaces (PID, mount, UTS, IPC); cgroup v2 auto-detected at startup
- **Runtime-Aware Caching**: Separate cache directories per runtime (`/mcp-cache/node/{team_id}`, `/mcp-cache/python/{team_id}`)

### Technology Stack
Expand Down
8 changes: 4 additions & 4 deletions development/satellite/process-management.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ The system automatically selects the appropriate spawning mode based on environm

**nsjail Spawn (Production Linux):**
- **Resource limits** (rlimit-based):
- Virtual memory: 2048MB (rlimit_as)
- Virtual memory: unlimited (rlimit_as=inf — required for Node.js WASM, which reserves ~10GB virtual address space)
- CPU time: 60 seconds (rlimit_cpu)
- Processes: 1000 (rlimit_nproc)
- File descriptors: 1024 (rlimit_nofile)
Expand Down Expand Up @@ -527,7 +527,7 @@ TeamIsolationService provides:
- Handshake timeout: 30 seconds

**Resource Usage:**
- Memory per process: Base ~10-20MB (application-dependent, limited to 2048MB virtual via rlimit_as)
- Memory per process: Base ~10-20MB (application-dependent; virtual address space unlimited, physical RAM capped at 512MB via cgroup when enabled)
- Runtime-aware cache isolation: Separate cache directories per runtime (`/mcp-cache/node/{team_id}`, `/mcp-cache/python/{team_id}`)
- Event-driven architecture: Handles multiple processes concurrently
- CPU overhead: Minimal (background event loop processing)
Expand Down Expand Up @@ -572,16 +572,16 @@ LOG_LEVEL=debug npm run dev

**Resource Limits (Production):**
- nsjail enforces hard limits via rlimits:
- Virtual memory: 2048MB (rlimit_as)
- Virtual memory: unlimited (rlimit_as=inf — Node.js WASM requires ~10GB virtual address space)
- CPU time: 60 seconds (rlimit_cpu)
- Processes: 1000 (rlimit_nproc)
- File descriptors: 1024 (rlimit_nofile)
- File size: 50MB (rlimit_fsize)
- tmpfs for /tmp: 100MB limit
- tmpfs for GitHub deployments (/app): 300MB kernel-enforced quota
- Physical memory: 512MB per process via cgroup — auto-detected at startup; active only when satellite runs as a systemd service with `Delegate=yes` (see [Enable Cgroup Limits](/self-hosted/production-satellite#enable-cgroup-limits))
- Prevents resource exhaustion attacks
- Runtime-specific cache directories prevent cross-runtime contamination
- **Note**: Cgroup limits (physical memory 512MB) are currently disabled due to systemd delegation permissions

**Namespace Isolation (Production):**
- Complete process isolation per team and runtime
Expand Down
6 changes: 3 additions & 3 deletions self-hosted/docker-compose.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -196,9 +196,9 @@ The satellite must be deployed separately after completing the setup wizard:

<Info>
**When to set `DEPLOYSTACK_SATELLITE_URL`:**
- Required when MCP clients (Claude Code, VS Code, etc.) connect via a domain or IP address
- Not needed for local development on localhost
- Use base URL only (e.g., `https://satellite.example.com`) - no `/mcp` or `/sse` paths
- The Docker image defaults to `http://localhost:3001` which works for local development
- Override with `-e DEPLOYSTACK_SATELLITE_URL="https://satellite.example.com"` when MCP clients connect via a domain or IP address
- Use base URL only no `/mcp` or `/sse` paths
- Required for OAuth authentication to work with remote MCP clients
</Info>

Expand Down
54 changes: 30 additions & 24 deletions self-hosted/production-satellite.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ For **development or single-team** usage, the [Docker Compose setup](/self-hoste
Production satellites provide enterprise-grade security through:

- **nsjail Process Isolation**: Complete process separation per team with Linux namespaces and cgroup enforcement
- **Resource Limits**: CPU, memory, and process limits per MCP server (512MB physical RAM via cgroup, 2GB virtual RAM via rlimit, 60s CPU, 1000 processes)
- **Resource Limits**: CPU, memory, and process limits per MCP server (virtual RAM unlimited via rlimit, 512MB physical RAM via cgroup when enabled, 60s CPU, 1000 processes)
- **Multi-Runtime Support**: Node.js (npx) and Python (uvx) with runtime-aware isolation
- **Filesystem Jailing**: Read-only system directories, isolated writable spaces per runtime
- **Non-Root Execution**: Satellite runs as dedicated `deploystack` user
Expand Down Expand Up @@ -327,8 +327,8 @@ EVENT_BATCH_INTERVAL_MS=3000
EVENT_MAX_BATCH_SIZE=100

# nsjail Resource Limits
NSJAIL_MEMORY_LIMIT_MB=2048 # Virtual memory limit (rlimit fallback)
NSJAIL_CGROUP_MEM_MAX_BYTES=536870912 # Physical memory limit: 512MB (cgroup)
NSJAIL_MEMORY_LIMIT_MB=inf # Virtual memory limit — "inf" required for Node.js WASM (undici reserves ~10GB virtual address space)
NSJAIL_CGROUP_MEM_MAX_BYTES=536870912 # Physical memory limit: 512MB (cgroup, only active with Delegate=yes in systemd unit)
NSJAIL_CPU_TIME_LIMIT_SECONDS=60 # CPU time limit
NSJAIL_MAX_PROCESSES=1000 # Process limit (rlimit)
NSJAIL_CGROUP_PIDS_MAX=1000 # Process limit (cgroup)
Expand Down Expand Up @@ -543,16 +543,17 @@ Production satellites use nsjail to provide:

Each MCP server process is limited to:

- **Virtual Memory**: 2048MB (enforced via rlimit_as)
- **Virtual Memory**: unlimited (rlimit_as = `inf` — required because Node.js v24 uses WASM internally which reserves ~10GB of virtual address space; this is virtual, not physical RAM)
- **Physical Memory**: 512MB via cgroup (only active when `Delegate=yes` is set in the systemd unit — see below)
- **CPU Time**: 60 seconds (enforced via rlimit_cpu)
- **Processes**: 1000 (enforced via rlimit_nproc, required for package managers like npm and uvx)
- **Processes**: 1000 (enforced via rlimit_nproc and cgroup pids.max, required for package managers like npm and uvx)
- **File Descriptors**: 1024 (enforced via rlimit_nofile)
- **Maximum File Size**: 50MB (enforced via rlimit_fsize)
- **tmpfs /tmp**: 100MB (enforced via tmpfs mount)

<Warning>
**Cgroup Limits Currently Disabled**: Physical memory (512MB) and process count cgroup limits are disabled due to systemd cgroup delegation permissions. The satellite uses rlimit-based resource controls instead, which provide equivalent DoS protection. See the Future Enhancement section below for re-enabling cgroup limits via systemd delegation.
</Warning>
<Info>
**Cgroup limits are auto-detected**: The satellite automatically detects whether cgroup v2 is available and delegated. When running as a systemd service with `Delegate=yes`, physical memory (512MB) and PID limits are enforced via cgroup in addition to rlimits. Without `Delegate=yes`, the satellite falls back to rlimit-only mode — nsjail still runs safely with full namespace isolation. See the [Enable Cgroup Limits](#enable-cgroup-limits) section below to activate precise physical memory enforcement.
</Info>

<Info>
**Primary Security = Namespace Isolation**: The satellite's security model relies on Linux namespaces (PID, Mount, User, IPC, UTS) to isolate MCP servers from each other and the host system. Resource limits (rlimits) provide secondary DoS protection. With user namespace active, all privilege escalation attacks (including setuid-based rlimit bypasses) are prevented.
Expand Down Expand Up @@ -771,9 +772,9 @@ Monitor and plan for:
- Log disk usage growth
- Network bandwidth for backend communication

## Future Enhancement: Enable Cgroup Limits
## Enable Cgroup Limits

Currently, cgroup limits are disabled due to systemd cgroup delegation permissions. To re-enable precise physical memory (512MB) and process count limits in the future:
By default the satellite runs in rlimit-only mode. Adding `Delegate=yes` to the systemd unit gives the satellite ownership of its cgroup subtree, which activates precise physical memory (512MB) and PID enforcement per MCP process. **No code changes are needed** — the satellite auto-detects cgroup availability at startup.

### 1. Modify Systemd Service File

Expand All @@ -796,32 +797,37 @@ sudo systemctl daemon-reload
sudo systemctl restart deploystack-satellite
```

### 3. Update Satellite Code
### 3. Verify Cgroup Limits Are Active

Check the startup log for confirmation:

The satellite code will need to be updated to re-enable cgroup v2 flags:
- `--use_cgroupv2`
- `--cgroupv2_mount /sys/fs/cgroup`
- `--cgroup_mem_max 536870912` (512MB)
- `--cgroup_pids_max 1000`
```bash
sudo grep "cgroup_detection" /var/log/deploystack-satellite/satellite.log
```

Contact DeployStack support or check the GitHub repository for updated releases that support cgroup delegation.
You should see a line like:
```
Cgroup v2 available at /sys/fs/cgroup/system.slice/deploystack-satellite.service — memory/PID limits will be enforced
```

### 4. Verify Cgroup Limits
If you see `Cgroup v2 unavailable` instead, verify that `Delegate=yes` is in the service file and that you reloaded systemd.

After restart, verify cgroup limits are active:
You can also check active limits on a running MCP process:

```bash
# Find running MCP process
# Find a running MCP process PID
ps aux | grep "npx.*mcp"

# Check cgroup limits (replace {pid} with actual PID)
# Check its cgroup assignment (replace {pid} with actual PID)
cat /proc/{pid}/cgroup
cat /sys/fs/cgroup/NSJAIL.*/memory.max
cat /sys/fs/cgroup/NSJAIL.*/pids.max

# Check enforced limits
cat /sys/fs/cgroup/system.slice/deploystack-satellite.service/NSJAIL.*/memory.max
cat /sys/fs/cgroup/system.slice/deploystack-satellite.service/NSJAIL.*/pids.max
```

<Info>
**Note**: This enhancement is optional. The current rlimit-based approach provides strong security through namespace isolation and adequate DoS protection. Cgroup limits add precision to resource accounting but don't change the fundamental security model.
**Cgroup limits are optional.** The rlimit-only default provides strong security through namespace isolation and adequate DoS protection. Cgroup limits add precise physical memory enforcement per MCP process, which is useful in high-density multi-team environments where a single runaway process consuming all RAM would otherwise affect other teams.
</Info>

## Next Steps
Expand Down
6 changes: 3 additions & 3 deletions self-hosted/quick-start.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -263,9 +263,9 @@ After completing the basic backend and frontend setup, deploy at least one satel

<Info>
**When to set `DEPLOYSTACK_SATELLITE_URL`:**
- Required when MCP clients (Claude Code, VS Code, etc.) connect via a domain or IP address
- Not needed for local development on localhost
- Use base URL only (e.g., `https://satellite.example.com`) - no `/mcp` or `/sse` paths
- The Docker image defaults to `http://localhost:3001` which works for local development
- Override with `-e DEPLOYSTACK_SATELLITE_URL="https://satellite.example.com"` when MCP clients connect via a domain or IP address
- Use base URL only no `/mcp` or `/sse` paths
- Required for OAuth authentication to work with remote MCP clients
</Info>

Expand Down
Loading