Skip to content

Commit 82f89ea

Browse files
authored
Merge pull request #340 from deploystackio/main
prod-deploy
2 parents 697f51e + 0a2eb4e commit 82f89ea

File tree

5 files changed

+48
-42
lines changed

5 files changed

+48
-42
lines changed

development/satellite/architecture.mdx

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -351,7 +351,7 @@ See [Log Capture](/development/satellite/log-capture) for buffering implementati
351351

352352
**Resource Management (stdio processes):**
353353
- **nsjail Isolation**: PID, network, filesystem isolation in production
354-
- **Resource Quotas**: 512MB physical RAM (cgroup), 2048MB virtual (rlimit), 60s CPU time, 1000 processes
354+
- **Resource Quotas**: virtual RAM unlimited (rlimit_as=inf), 512MB physical RAM via cgroup when enabled, 60s CPU time, 1000 processes
355355
- **Development Mode**: Direct spawn() without isolation for cross-platform development
356356

357357
**Authentication & Authorization:**
@@ -375,10 +375,10 @@ See [Team Isolation](/development/satellite/team-isolation) for complete impleme
375375
- **Instance Isolation**: Processes tracked by `team_id` AND `user_id` with independent lifecycles
376376
- **ProcessId Format**: `{server_slug}-{team_slug}-{user_slug}-{installation_id}`
377377
- **Tool Discovery**: Automatic tool caching with per-user namespacing
378-
- **Resource Limits**: nsjail in production with cgroup enforcement
379-
- 512MB physical memory (cgroup_mem_max) - Precise control over actual memory usage
380-
- 2048MB virtual memory (rlimit_as) - Fallback for systems without cgroup support
381-
- 1000 processes (cgroup_pids_max) - Adequate for package managers
378+
- **Resource Limits**: nsjail in production with auto-detected cgroup enforcement
379+
- Virtual memory unlimited (rlimit_as=inf) — Node.js WASM requires ~10GB virtual address space
380+
- 512MB physical memory (cgroup_mem_max) — active only when systemd `Delegate=yes` is configured
381+
- 1000 processes (cgroup_pids_max + rlimit_nproc) — adequate for package managers
382382
- 60s CPU time limit
383383
- Runtime-specific cache directories: `/mcp-cache/node/{team_id}`, `/mcp-cache/python/{team_id}`
384384
- **Development Mode**: Plain spawn() on all platforms for easy debugging
@@ -420,12 +420,12 @@ Configuration → Spawn → Monitor → Health Check → Restart/Terminate
420420
- **Error Handling**: Complete HTTP status code mapping
421421

422422
### Current Resource Isolation Specifications
423-
- **Physical Memory Limit**: 512MB RAM per MCP server process via cgroup (precise enforcement)
424-
- **Virtual Memory Limit**: 2048MB RAM via rlimit (fallback for non-cgroup systems)
423+
- **Virtual Memory Limit**: unlimited (rlimit_as=inf) — Node.js v24 WASM (undici HTTP parser) reserves ~10GB virtual address space; this is virtual, not physical RAM
424+
- **Physical Memory Limit**: 512MB per MCP server process via cgroup — active only when satellite runs as a systemd service with `Delegate=yes`; falls back to rlimit-only otherwise
425425
- **CPU Limit**: 60s CPU time limit
426426
- **Process Limit**: 1000 processes per MCP server (accommodates package managers like npm, uvx)
427427
- **Process Timeout**: 3-minute idle timeout for automatic cleanup
428-
- **Isolation Method**: nsjail with Linux namespaces (PID, mount, UTS, IPC) and cgroup enforcement
428+
- **Isolation Method**: nsjail with Linux namespaces (PID, mount, UTS, IPC); cgroup v2 auto-detected at startup
429429
- **Runtime-Aware Caching**: Separate cache directories per runtime (`/mcp-cache/node/{team_id}`, `/mcp-cache/python/{team_id}`)
430430

431431
### Technology Stack

development/satellite/process-management.mdx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ The system automatically selects the appropriate spawning mode based on environm
5757

5858
**nsjail Spawn (Production Linux):**
5959
- **Resource limits** (rlimit-based):
60-
- Virtual memory: 2048MB (rlimit_as)
60+
- Virtual memory: unlimited (rlimit_as=inf — required for Node.js WASM, which reserves ~10GB virtual address space)
6161
- CPU time: 60 seconds (rlimit_cpu)
6262
- Processes: 1000 (rlimit_nproc)
6363
- File descriptors: 1024 (rlimit_nofile)
@@ -527,7 +527,7 @@ TeamIsolationService provides:
527527
- Handshake timeout: 30 seconds
528528

529529
**Resource Usage:**
530-
- Memory per process: Base ~10-20MB (application-dependent, limited to 2048MB virtual via rlimit_as)
530+
- Memory per process: Base ~10-20MB (application-dependent; virtual address space unlimited, physical RAM capped at 512MB via cgroup when enabled)
531531
- Runtime-aware cache isolation: Separate cache directories per runtime (`/mcp-cache/node/{team_id}`, `/mcp-cache/python/{team_id}`)
532532
- Event-driven architecture: Handles multiple processes concurrently
533533
- CPU overhead: Minimal (background event loop processing)
@@ -572,16 +572,16 @@ LOG_LEVEL=debug npm run dev
572572

573573
**Resource Limits (Production):**
574574
- nsjail enforces hard limits via rlimits:
575-
- Virtual memory: 2048MB (rlimit_as)
575+
- Virtual memory: unlimited (rlimit_as=inf — Node.js WASM requires ~10GB virtual address space)
576576
- CPU time: 60 seconds (rlimit_cpu)
577577
- Processes: 1000 (rlimit_nproc)
578578
- File descriptors: 1024 (rlimit_nofile)
579579
- File size: 50MB (rlimit_fsize)
580580
- tmpfs for /tmp: 100MB limit
581581
- tmpfs for GitHub deployments (/app): 300MB kernel-enforced quota
582+
- Physical memory: 512MB per process via cgroup — auto-detected at startup; active only when satellite runs as a systemd service with `Delegate=yes` (see [Enable Cgroup Limits](/self-hosted/production-satellite#enable-cgroup-limits))
582583
- Prevents resource exhaustion attacks
583584
- Runtime-specific cache directories prevent cross-runtime contamination
584-
- **Note**: Cgroup limits (physical memory 512MB) are currently disabled due to systemd delegation permissions
585585

586586
**Namespace Isolation (Production):**
587587
- Complete process isolation per team and runtime

self-hosted/docker-compose.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -196,9 +196,9 @@ The satellite must be deployed separately after completing the setup wizard:
196196

197197
<Info>
198198
**When to set `DEPLOYSTACK_SATELLITE_URL`:**
199-
- Required when MCP clients (Claude Code, VS Code, etc.) connect via a domain or IP address
200-
- Not needed for local development on localhost
201-
- Use base URL only (e.g., `https://satellite.example.com`) - no `/mcp` or `/sse` paths
199+
- The Docker image defaults to `http://localhost:3001` which works for local development
200+
- Override with `-e DEPLOYSTACK_SATELLITE_URL="https://satellite.example.com"` when MCP clients connect via a domain or IP address
201+
- Use base URL only no `/mcp` or `/sse` paths
202202
- Required for OAuth authentication to work with remote MCP clients
203203
</Info>
204204

self-hosted/production-satellite.mdx

Lines changed: 30 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ For **development or single-team** usage, the [Docker Compose setup](/self-hoste
2121
Production satellites provide enterprise-grade security through:
2222

2323
- **nsjail Process Isolation**: Complete process separation per team with Linux namespaces and cgroup enforcement
24-
- **Resource Limits**: CPU, memory, and process limits per MCP server (512MB physical RAM via cgroup, 2GB virtual RAM via rlimit, 60s CPU, 1000 processes)
24+
- **Resource Limits**: CPU, memory, and process limits per MCP server (virtual RAM unlimited via rlimit, 512MB physical RAM via cgroup when enabled, 60s CPU, 1000 processes)
2525
- **Multi-Runtime Support**: Node.js (npx) and Python (uvx) with runtime-aware isolation
2626
- **Filesystem Jailing**: Read-only system directories, isolated writable spaces per runtime
2727
- **Non-Root Execution**: Satellite runs as dedicated `deploystack` user
@@ -327,8 +327,8 @@ EVENT_BATCH_INTERVAL_MS=3000
327327
EVENT_MAX_BATCH_SIZE=100
328328
329329
# nsjail Resource Limits
330-
NSJAIL_MEMORY_LIMIT_MB=2048 # Virtual memory limit (rlimit fallback)
331-
NSJAIL_CGROUP_MEM_MAX_BYTES=536870912 # Physical memory limit: 512MB (cgroup)
330+
NSJAIL_MEMORY_LIMIT_MB=inf # Virtual memory limit — "inf" required for Node.js WASM (undici reserves ~10GB virtual address space)
331+
NSJAIL_CGROUP_MEM_MAX_BYTES=536870912 # Physical memory limit: 512MB (cgroup, only active with Delegate=yes in systemd unit)
332332
NSJAIL_CPU_TIME_LIMIT_SECONDS=60 # CPU time limit
333333
NSJAIL_MAX_PROCESSES=1000 # Process limit (rlimit)
334334
NSJAIL_CGROUP_PIDS_MAX=1000 # Process limit (cgroup)
@@ -543,16 +543,17 @@ Production satellites use nsjail to provide:
543543

544544
Each MCP server process is limited to:
545545

546-
- **Virtual Memory**: 2048MB (enforced via rlimit_as)
546+
- **Virtual Memory**: unlimited (rlimit_as = `inf` — required because Node.js v24 uses WASM internally which reserves ~10GB of virtual address space; this is virtual, not physical RAM)
547+
- **Physical Memory**: 512MB via cgroup (only active when `Delegate=yes` is set in the systemd unit — see below)
547548
- **CPU Time**: 60 seconds (enforced via rlimit_cpu)
548-
- **Processes**: 1000 (enforced via rlimit_nproc, required for package managers like npm and uvx)
549+
- **Processes**: 1000 (enforced via rlimit_nproc and cgroup pids.max, required for package managers like npm and uvx)
549550
- **File Descriptors**: 1024 (enforced via rlimit_nofile)
550551
- **Maximum File Size**: 50MB (enforced via rlimit_fsize)
551552
- **tmpfs /tmp**: 100MB (enforced via tmpfs mount)
552553

553-
<Warning>
554-
**Cgroup Limits Currently Disabled**: Physical memory (512MB) and process count cgroup limits are disabled due to systemd cgroup delegation permissions. The satellite uses rlimit-based resource controls instead, which provide equivalent DoS protection. See the Future Enhancement section below for re-enabling cgroup limits via systemd delegation.
555-
</Warning>
554+
<Info>
555+
**Cgroup limits are auto-detected**: The satellite automatically detects whether cgroup v2 is available and delegated. When running as a systemd service with `Delegate=yes`, physical memory (512MB) and PID limits are enforced via cgroup in addition to rlimits. Without `Delegate=yes`, the satellite falls back to rlimit-only mode — nsjail still runs safely with full namespace isolation. See the [Enable Cgroup Limits](#enable-cgroup-limits) section below to activate precise physical memory enforcement.
556+
</Info>
556557

557558
<Info>
558559
**Primary Security = Namespace Isolation**: The satellite's security model relies on Linux namespaces (PID, Mount, User, IPC, UTS) to isolate MCP servers from each other and the host system. Resource limits (rlimits) provide secondary DoS protection. With user namespace active, all privilege escalation attacks (including setuid-based rlimit bypasses) are prevented.
@@ -771,9 +772,9 @@ Monitor and plan for:
771772
- Log disk usage growth
772773
- Network bandwidth for backend communication
773774

774-
## Future Enhancement: Enable Cgroup Limits
775+
## Enable Cgroup Limits
775776

776-
Currently, cgroup limits are disabled due to systemd cgroup delegation permissions. To re-enable precise physical memory (512MB) and process count limits in the future:
777+
By default the satellite runs in rlimit-only mode. Adding `Delegate=yes` to the systemd unit gives the satellite ownership of its cgroup subtree, which activates precise physical memory (512MB) and PID enforcement per MCP process. **No code changes are needed**the satellite auto-detects cgroup availability at startup.
777778

778779
### 1. Modify Systemd Service File
779780

@@ -796,32 +797,37 @@ sudo systemctl daemon-reload
796797
sudo systemctl restart deploystack-satellite
797798
```
798799

799-
### 3. Update Satellite Code
800+
### 3. Verify Cgroup Limits Are Active
801+
802+
Check the startup log for confirmation:
800803

801-
The satellite code will need to be updated to re-enable cgroup v2 flags:
802-
- `--use_cgroupv2`
803-
- `--cgroupv2_mount /sys/fs/cgroup`
804-
- `--cgroup_mem_max 536870912` (512MB)
805-
- `--cgroup_pids_max 1000`
804+
```bash
805+
sudo grep "cgroup_detection" /var/log/deploystack-satellite/satellite.log
806+
```
806807

807-
Contact DeployStack support or check the GitHub repository for updated releases that support cgroup delegation.
808+
You should see a line like:
809+
```
810+
Cgroup v2 available at /sys/fs/cgroup/system.slice/deploystack-satellite.service — memory/PID limits will be enforced
811+
```
808812

809-
### 4. Verify Cgroup Limits
813+
If you see `Cgroup v2 unavailable` instead, verify that `Delegate=yes` is in the service file and that you reloaded systemd.
810814

811-
After restart, verify cgroup limits are active:
815+
You can also check active limits on a running MCP process:
812816

813817
```bash
814-
# Find running MCP process
818+
# Find a running MCP process PID
815819
ps aux | grep "npx.*mcp"
816820

817-
# Check cgroup limits (replace {pid} with actual PID)
821+
# Check its cgroup assignment (replace {pid} with actual PID)
818822
cat /proc/{pid}/cgroup
819-
cat /sys/fs/cgroup/NSJAIL.*/memory.max
820-
cat /sys/fs/cgroup/NSJAIL.*/pids.max
823+
824+
# Check enforced limits
825+
cat /sys/fs/cgroup/system.slice/deploystack-satellite.service/NSJAIL.*/memory.max
826+
cat /sys/fs/cgroup/system.slice/deploystack-satellite.service/NSJAIL.*/pids.max
821827
```
822828

823829
<Info>
824-
**Note**: This enhancement is optional. The current rlimit-based approach provides strong security through namespace isolation and adequate DoS protection. Cgroup limits add precision to resource accounting but don't change the fundamental security model.
830+
**Cgroup limits are optional.** The rlimit-only default provides strong security through namespace isolation and adequate DoS protection. Cgroup limits add precise physical memory enforcement per MCP process, which is useful in high-density multi-team environments where a single runaway process consuming all RAM would otherwise affect other teams.
825831
</Info>
826832

827833
## Next Steps

self-hosted/quick-start.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -263,9 +263,9 @@ After completing the basic backend and frontend setup, deploy at least one satel
263263

264264
<Info>
265265
**When to set `DEPLOYSTACK_SATELLITE_URL`:**
266-
- Required when MCP clients (Claude Code, VS Code, etc.) connect via a domain or IP address
267-
- Not needed for local development on localhost
268-
- Use base URL only (e.g., `https://satellite.example.com`) - no `/mcp` or `/sse` paths
266+
- The Docker image defaults to `http://localhost:3001` which works for local development
267+
- Override with `-e DEPLOYSTACK_SATELLITE_URL="https://satellite.example.com"` when MCP clients connect via a domain or IP address
268+
- Use base URL only no `/mcp` or `/sse` paths
269269
- Required for OAuth authentication to work with remote MCP clients
270270
</Info>
271271

0 commit comments

Comments
 (0)