Skip to content

feat(envd): persist command output and add per-command log retrieval#2935

Draft
mishushakov wants to merge 5 commits into
mainfrom
mishushakov/command-logs-method
Draft

feat(envd): persist command output and add per-command log retrieval#2935
mishushakov wants to merge 5 commits into
mainfrom
mishushakov/command-logs-method

Conversation

@mishushakov
Copy link
Copy Markdown
Member

@mishushakov mishushakov commented Jun 5, 2026

Commands started via envd previously had their stdout/stderr discarded once no client was streaming. This persists each command's output through the existing envd → Loki log pipeline as process_output lines, stamped with the command's pid and capped per command (with a truncation marker). A new GET /v2/sandboxes/{sandboxID}/commands/{pid}/logs endpoint retrieves a single command's output, filtered by pid + event_type=process_output within a start/end time window — the window disambiguates a reused pid, since within [start, end] a pid maps to one execution. The pid is already returned in StartEvent, so no proto change is needed; the envd version is bumped (0.6.1 → 0.6.2) for the output-persistence behavior. PTY/interactive sessions are out of scope; includes unit tests for output line-buffering/cap and the pid query filter.

🤖 Generated with Claude Code

Commands started via envd now get a unique cid that is returned in the
StartEvent and stamped on every stdout/stderr log line, so their output is
persisted through the existing Loki pipeline and capped per command. Adds
GET /v2/sandboxes/{sandboxID}/commands/{cid}/logs to retrieve a single
command's output, with the cid filter threaded through the local Loki query
and the remote edge contract.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@cla-bot cla-bot Bot added the cla-signed label Jun 5, 2026
@cursor
Copy link
Copy Markdown

cursor Bot commented Jun 5, 2026

PR Summary

Medium Risk
Increases log volume from every non-PTY command and changes envd behavior in production sandboxes; retrieval correctness depends on callers supplying a tight start/end window when pids are reused.

Overview
This PR makes non-PTY command stdout/stderr durable in the sandbox log pipeline and exposes it through a dedicated retrieval API. envd now always records process output as process_output log lines tagged with pid and stream, with a shared per-command byte budget and truncation marker, while live streaming behavior is unchanged. GetSandboxLogs gains an optional pid filter that flows through cluster resources, the edge API, and Loki (pid + event_type=process_output). Clients can call GET /v2/sandboxes/{sandboxID}/commands/{pid}/logs with start/end (and the usual log query params) to fetch one command’s output; the time window is meant to disambiguate reused pids. OpenAPI-generated clients and mocks are regenerated; envd is bumped to 0.6.2.

Reviewed by Cursor Bugbot for commit 9eef414. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

In packages/api/internal/handlers/command_logs.go, the expression new(time.UnixMilli(*params.Cursor)) is invalid Go syntax because new expects a type rather than a value. This will cause a compilation error, which can be resolved by converting the timestamp to a time.Time value first and then taking its address.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +50 to +53
var cursor *time.Time
if params.Cursor != nil {
cursor = new(time.UnixMilli(*params.Cursor))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The expression new(time.UnixMilli(*params.Cursor)) is invalid Go syntax because new expects a type rather than a value or function call. To resolve this compilation error, convert the timestamp to a time.Time value first and then assign its address to the cursor pointer.

Suggested change
var cursor *time.Time
if params.Cursor != nil {
cursor = new(time.UnixMilli(*params.Cursor))
}
var cursor *time.Time
if params.Cursor != nil {
t := time.UnixMilli(*params.Cursor)
cursor = &t
}

Comment thread packages/shared/pkg/logs/loki/provider.go
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 5, 2026

❌ 3 Tests Failed:

Tests completed Failed Passed Skipped
2752 3 2749 5
View the top 1 failed test(s) by shortest run time
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestCommandKillNextApp
Stack Traces | 190s run time
=== RUN   TestCommandKillNextApp
=== PAUSE TestCommandKillNextApp
=== CONT  TestCommandKillNextApp
    process_test.go:28: Command [npx] output: event:{start:{pid:1256}}
Executing command /bin/bash in sandbox i415gce8wadp9pmmlrbok
    process_test.go:28: Command [npx] output: event:{data:{stderr:"npm WARN exec The following package was not found and will be installed: create-next-app@16.2.7\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"Creating a new Next.js app in .../home/user/nextapp.\n\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"Using npm.\n\nInitializing project with template: app-tw \n\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"\nInstalling dependencies:\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"- next\n- react\n- react-dom\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"\nInstalling devDependencies:\n- @tailwindcss/postcss\n- @types/node\n- @types/react\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"- @types/react-dom\n- eslint\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"- eslint-config-next\n- tailwindcss\n- typescript\n\n"}}
    process_test.go:29: 
        	Error Trace:	.../tests/envd/process_test.go:29
        	Error:      	Received unexpected error:
        	            	failed to execute command npx in sandbox ivnfk4b8ylk21o985h30k: invalid_argument: protocol error: incomplete envelope: unexpected EOF
        	Test:       	TestCommandKillNextApp
--- FAIL: TestCommandKillNextApp (189.68s)
View the full list of 2 ❄️ flaky test(s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity

Flake rate in main: 54.21% (Passed 913 times, Failed 1081 times)

Stack Traces | 59.7s run time
=== RUN   TestSandboxMemoryIntegrity
=== PAUSE TestSandboxMemoryIntegrity
=== CONT  TestSandboxMemoryIntegrity
    sandbox_memory_integrity_test.go:27: Build completed successfully
--- FAIL: TestSandboxMemoryIntegrity (59.72s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity/tmpfs_hash

Flake rate in main: 54.28% (Passed 903 times, Failed 1072 times)

Stack Traces | 197s run time
=== RUN   TestSandboxMemoryIntegrity/tmpfs_hash
=== PAUSE TestSandboxMemoryIntegrity/tmpfs_hash
=== CONT  TestSandboxMemoryIntegrity/tmpfs_hash
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{start:{pid:1253}}
Executing command bash in sandbox i16ixtmh12v50f636wvuq (user: root)
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Total memory: 985 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Used memory before tmpfs mount: 184 MB\nFree memory before tmpfs mount: 800 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Memory to use in integrity test (60% of free, min 64MB): 480 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"480+0 records in\n480+0 records out\n503316480 bytes (503 MB, 480 MiB) copied, 2.35969 s, 213 MB/s\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\tCommand being timed: \"dd if=/dev/urandom of=/mnt/testfile bs=1M count=480\"\n\tUser time (seconds): 0.00\n\tSystem time (seconds): 2.33\n\tPercent of CPU this job got: 98%\n\tElapsed (wall clock) time (h:mm:ss or m:ss): 0:02.36\n\tAverage shared text size (kbytes): 0\n\tAverage unshared data size (kbytes): 0\n\tAverage stack size (kbytes): 0\n\tAverage total size (kbytes): 0\n\tMaximum resident set size (kbytes): 2716\n\tAverage resident set size (kbytes): 0\n\tMajor (requiring I/O) page faults: 3\n\tMinor (reclaiming a frame) page faults: 345\n\tVoluntary context switches: 4\n\tInvoluntary context switches: 11\n\tSwaps: 0\n\tFile system inputs: 176\n\tFile system outputs: 0\n\tSocket messages sent: 0\n\tSocket messages received: 0\n\tSignals delivered: 0\n\tPage size (bytes): 4096\n\tExit status: 0\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Used memory after tmpfs mount and file fill: 670 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{end:{exited:true  status:"exit status 0"}}
    sandbox_memory_integrity_test.go:70: Command [bash] completed successfully in sandbox id4md7jfi9p20krlv6pkc
Executing command bash in sandbox id4md7jfi9p20krlv6pkc (user: root)
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{start:{pid:1270}}
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{data:{stdout:"d9634e428cb255f7871ef89430c03b39550112956bad354ead196a78fab99c70\n"}}
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{end:{exited:true  status:"exit status 0"}}
    sandbox_memory_integrity_test.go:80: Command [bash] completed successfully in sandbox id4md7jfi9p20krlv6pkc
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{start:{pid:1273}}
Executing command bash in sandbox iv9hns4v9kzxqg304u3to (user: root)
    sandbox_memory_integrity_test.go:110: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:81
        	            				.../hostedtoolcache/go/1.26.3.../src/runtime/asm_amd64.s:1771
        	Error:      	Received unexpected error:
        	            	failed to execute command bash in sandbox id4md7jfi9p20krlv6pkc: unavailable: HTTP status 502 Bad Gateway
    sandbox_memory_integrity_test.go:110: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:78
        	            				.../tests/orchestrator/sandbox_memory_integrity_test.go:110
        	Error:      	Condition never satisfied
        	Test:       	TestSandboxMemoryIntegrity/tmpfs_hash
--- FAIL: TestSandboxMemoryIntegrity/tmpfs_hash (196.57s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

github-actions Bot and others added 2 commits June 5, 2026 19:58
The cid is stamped on process_start/process_end lifecycle lines too, so
filtering by cid alone returned them alongside output. Add an
event_type=process_output filter so the command-logs endpoint returns only
stdout/stderr.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread packages/envd/internal/services/process/handler/output_log.go
Drop the envd-assigned cid (and its proto changes); commands are already
identified by the pid returned in StartEvent. Output lines are now stamped
with pid, and the retrieval endpoint becomes
GET /v2/sandboxes/{sandboxID}/commands/{pid}/logs with start/end query params.
The time window disambiguates a reused pid: within [start, end] a pid maps to
a single command execution. Filter stays scoped to event_type=process_output.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 28467c0. Configure here.

Comment thread packages/envd/internal/services/process/handler/output_log.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant