Skip to content

feat(orchestrator): add allocated resource metrics for sandboxes#2943

Merged
jakubno merged 1 commit into
mainfrom
chore/allocated-metric
Jun 8, 2026
Merged

feat(orchestrator): add allocated resource metrics for sandboxes#2943
jakubno merged 1 commit into
mainfrom
chore/allocated-metric

Conversation

@jakubno
Copy link
Copy Markdown
Member

@jakubno jakubno commented Jun 7, 2026

Add observable gauges for CPU, memory, and disk allocated to running sandboxes on the orchestrator node, under the orchestrator.sandbox.* prefix. Computed from a single Sandboxes.Items() iteration.

Add observable gauges for CPU, memory, and disk allocated to running
sandboxes on the orchestrator node, under the orchestrator.sandbox.*
prefix. Computed from a single Sandboxes.Items() iteration.
@cla-bot cla-bot Bot added the cla-signed label Jun 7, 2026
@cursor
Copy link
Copy Markdown

cursor Bot commented Jun 7, 2026

PR Summary

Low Risk
Read-only observability on startup registration; main risk is callback iteration without nil guards if the sandbox map can contain nil entries.

Overview
Adds orchestrator OpenTelemetry observable gauges for total vCPU, memory, and disk allocated to all running sandboxes on the node (orchestrator.sandbox.cpu.allocated, .memory.allocated, .disk.allocated), updated in one RegisterCallback that sums each sandbox’s Config after converting MB to bytes. The metric names, descriptions, and units are registered in shared telemetry alongside existing orchestrator gauges.

The allocation callback does not skip nil sandbox entries unlike List, so a nil value in Sandboxes.Items() could panic when reading Config.

Reviewed by Cursor Bugbot for commit 1c6567b. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The callback registered in the orchestrator server accesses nested pointers on server.sandboxFactory and the sandbox items without checking for nil, which can lead to a nil pointer dereference panic during metric collection.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread packages/orchestrator/pkg/server/main.go
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 7, 2026

❌ 2 Tests Failed:

Tests completed Failed Passed Skipped
2746 2 2744 7
View the full list of 5 ❄️ flaky test(s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestSandboxListPaginationRunningLargerLimit

Flake rate in main: 39.43% (Passed 920 times, Failed 599 times)

Stack Traces | 95.3s run time
=== RUN   TestSandboxListPaginationRunningLargerLimit
    sandbox_list_test.go:327: Created sandbox 1/12: in7yhpb6mk3p0g8jb065t
    sandbox_list_test.go:327: Created sandbox 2/12: id62p9y1girk0kcngaymg
    sandbox_list_test.go:327: Created sandbox 3/12: iswah2ntt0kiyd2cex6qh
    sandbox_list_test.go:327: Created sandbox 4/12: i79hoydrbn6jzl2ftu8yx
    sandbox_list_test.go:327: Created sandbox 5/12: iabkkrn9riff9xtgyluf9
    sandbox_list_test.go:327: Created sandbox 6/12: igh59e1ys7wyy9him8znx
    sandbox_list_test.go:327: Created sandbox 7/12: i8cx2jcfuuz6jqz7icyl5
    sandbox_list_test.go:327: Created sandbox 8/12: izhyda9pfnfpjyhiwg8u8
    sandbox_list_test.go:327: Created sandbox 9/12: i5t3xkc3n9opp41b78q55
    sandbox_list_test.go:327: Created sandbox 10/12: i8xwjr03li45umkxn6dps
    sandbox_list_test.go:327: Created sandbox 11/12: ibx2gfaknd0t5it8rptny
    sandbox_list_test.go:327: Created sandbox 12/12: iyp40r6tc1kes9gbj5v73
    sandbox_list_test.go:330: 
        	Error Trace:	.../api/sandboxes/sandbox_list_test.go:340
        	            				.../hostedtoolcache/go/1.26.3.../src/runtime/asm_amd64.s:1771
        	Error:      	"[]" should have 12 item(s), but has 0
    sandbox_list_test.go:330: 
        	Error Trace:	.../api/sandboxes/sandbox_list_test.go:330
        	Error:      	Condition never satisfied
        	Test:       	TestSandboxListPaginationRunningLargerLimit
--- FAIL: TestSandboxListPaginationRunningLargerLimit (95.31s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity

Flake rate in main: 54.21% (Passed 913 times, Failed 1081 times)

Stack Traces | 68.1s run time
=== RUN   TestSandboxMemoryIntegrity
=== PAUSE TestSandboxMemoryIntegrity
=== CONT  TestSandboxMemoryIntegrity
    sandbox_memory_integrity_test.go:27: Build completed successfully
--- FAIL: TestSandboxMemoryIntegrity (68.14s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity/tmpfs_hash

Flake rate in main: 54.28% (Passed 903 times, Failed 1072 times)

Stack Traces | 210s run time
=== RUN   TestSandboxMemoryIntegrity/tmpfs_hash
=== PAUSE TestSandboxMemoryIntegrity/tmpfs_hash
=== CONT  TestSandboxMemoryIntegrity/tmpfs_hash
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{start:{pid:1258}}
Executing command bash in sandbox inni4vamp6ihqpj724nqb (user: root)
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Total memory: 985 MB\nUsed memory before tmpfs mount: 185 MB\nFree memory before tmpfs mount: 799 MB\nMemory to use in integrity test (60% of free, min 64MB): 479 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"479+0 records in\n479+0 records out\n502267904 bytes (502 MB, 479 MiB) copied, 4.61821 s, 109 MB/s"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\t"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"C"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"o"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"m"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"m"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"a"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"d"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:" "}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"b"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"e"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"i"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"g"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:" "}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"t"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"i"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"m"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"e"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"d"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:":"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:" "}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\""}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"dd"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:" "}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"if=/dev/urandom"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:" of=/mnt/testfile bs=1M count=479\"\n\tUser time (seconds): "}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"0.00"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\t"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"System time (sec"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"onds): 4.47\n\tPercent of CPU this job got: 96%\n\tElapsed (wall clock) time (h:mm:ss or m:ss): 0:04.63\n\tAverage shared text size (kbytes): 0\n\tAverage unshared data size (kbytes): 0\n\tAverage stack size (kbytes): 0\n\tAverage total size (kbytes): 0\n\tMaximum resident set size (kbytes): 2700\n\tAverage resident set size (kbytes): 0\n\tMajor (requiring I/O) page faults: 3\n\tMinor (reclaiming a frame) page faults: 344\n\tVoluntary context switches: 5\n\tInvoluntary context switches: 45\n\tSwaps: 0\n\tFile system inputs: 176\n\tFile system outputs: 0\n\tSocket messages sent: 0\n\tSocket messages received: 0\n\tSignals delivered: 0\n\tPage size (bytes): 4096\n\tExit status: 0\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Used memory after tmpfs mount and file fill: 672 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{end:{exited:true  status:"exit status 0"}}
    sandbox_memory_integrity_test.go:70: Command [bash] completed successfully in sandbox ih537ye6wadolr9rbyvtd
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{start:{pid:1275}}
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{data:{stdout:"45f3db2092948d69475be63503813b7aa03dceeca24f9073b545663b89dba60a\n"}}
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{end:{exited:true  status:"exit status 0"}}
    sandbox_memory_integrity_test.go:80: Command [bash] completed successfully in sandbox ih537ye6wadolr9rbyvtd
Executing command bash in sandbox ipi4b5w4soh8jrgtwzcyh (user: root)
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{start:{pid:1278}}
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
Executing command bash in sandbox ih537ye6wadolr9rbyvtd (user: root)
    sandbox_memory_integrity_test.go:110: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:81
        	            				.../hostedtoolcache/go/1.26.3.../src/runtime/asm_amd64.s:1771
        	Error:      	Received unexpected error:
        	            	failed to execute command bash in sandbox ih537ye6wadolr9rbyvtd: unavailable: HTTP status 502 Bad Gateway
    sandbox_memory_integrity_test.go:110: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:78
        	            				.../tests/orchestrator/sandbox_memory_integrity_test.go:110
        	Error:      	Condition never satisfied
        	Test:       	TestSandboxMemoryIntegrity/tmpfs_hash
--- FAIL: TestSandboxMemoryIntegrity/tmpfs_hash (209.86s)
github.com/e2b-dev/infra/tests/integration/internal/tests/proxies::TestEnvdAccessTokenAutoResumeViaProxy

Flake rate in main: 39.32% (Passed 909 times, Failed 589 times)

Stack Traces | 11.2s run time
=== RUN   TestEnvdAccessTokenAutoResumeViaProxy
=== PAUSE TestEnvdAccessTokenAutoResumeViaProxy
=== CONT  TestEnvdAccessTokenAutoResumeViaProxy
    traffic_access_token_test.go:357: 
        	Error Trace:	.../tests/proxies/traffic_access_token_test.go:357
        	Error:      	Received unexpected error:
        	            	Get "http://localhost:3002/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
        	Test:       	TestEnvdAccessTokenAutoResumeViaProxy
--- FAIL: TestEnvdAccessTokenAutoResumeViaProxy (11.23s)
github.com/e2b-dev/infra/tests/integration/internal/tests/proxies::TestSandboxAutoResumeViaProxy

Flake rate in main: 39.77% (Passed 907 times, Failed 599 times)

Stack Traces | 12s run time
=== RUN   TestSandboxAutoResumeViaProxy
=== PAUSE TestSandboxAutoResumeViaProxy
=== CONT  TestSandboxAutoResumeViaProxy
    auto_resume_test.go:116: 
        	Error Trace:	.../tests/proxies/auto_resume_test.go:116
        	Error:      	Received unexpected error:
        	            	Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
        	Test:       	TestSandboxAutoResumeViaProxy
--- FAIL: TestSandboxAutoResumeViaProxy (11.99s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@jakubno jakubno marked this pull request as ready for review June 7, 2026 06:50
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — straightforward observability addition following the existing gauge registration pattern.

Extended reasoning...

Overview

The PR adds three OpenTelemetry observable gauges (orchestrator.sandbox.cpu.allocated, .memory.allocated, .disk.allocated) registered via a single meter.RegisterCallback in packages/orchestrator/pkg/server/main.go. The callback iterates Sandboxes.Items() once and sums each sandbox's Config.Vcpu, Config.RamMB, and Config.TotalDiskSizeMB (converting MB to bytes). Corresponding name/description/unit entries are added to packages/shared/pkg/telemetry/meters.go.

Security risks

None. This is read-only telemetry registration on startup. No new inputs are accepted, no auth/crypto paths are touched, and no data is exposed beyond aggregate node-level resource totals.

Level of scrutiny

Low — this is purely additive observability code in a hot-path-adjacent but non-mutating callback. The metric registration, units, and descriptions follow the existing conventions in the same file, and the callback mirrors the structure of the statusGauge callback registered immediately above it.

Other factors

The Cursor/Gemini bots flagged the absence of nil checks on item.Config. The inline comment was already marked resolved, and the existing callback for statusGauge (just above) dereferences server.info without nil checks too — so this follows the project's established pattern that these structures are non-nil once New returns. The 0% patch coverage from codecov is expected for telemetry registration, which is integration-tested rather than unit-tested.

@jakubno jakubno merged commit 95cb6d3 into main Jun 8, 2026
55 checks passed
@jakubno jakubno deleted the chore/allocated-metric branch June 8, 2026 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants