Skip to content

Update cluster to 1.8 (Nomad) and 1.17 (Consul)#2896

Merged
jakubno merged 2 commits into
mainfrom
update-nomad-and-consul
Jun 2, 2026
Merged

Update cluster to 1.8 (Nomad) and 1.17 (Consul)#2896
jakubno merged 2 commits into
mainfrom
update-nomad-and-consul

Conversation

@sitole

@sitole sitole commented Jun 2, 2026

Copy link
Copy Markdown
Member

Details on Nomad and Consul update flow - #2870

Now we need to go to 1.8 (Nomad) and 1.17 (Consul), which are still backward-compatible with the current versions.
After this is set up and running on all nodes we will do two more updates to go to LTS 1.21.5 (Consul), 1.10.5 (Nomad)

@cursor

cursor Bot commented Jun 2, 2026

Copy link
Copy Markdown

PR Summary

High Risk
Cluster-wide Nomad/Consul upgrades and raw_exec cgroup changes can break scheduling or existing allocations until nodes are rebuilt and jobs redeployed.

Overview
Bumps Consul and Nomad versions baked into AWS and GCP cluster images, and adjusts Nomad client and job specs for the newer Nomad behavior. Client raw_exec plugin config drops no_cgroups, and orchestrator / template-manager tasks get explicit memory reservations plus a high memory_max ceiling to reduce OOM kills under the updated scheduler.

Reviewed by Cursor Bugbot for commit dfb1f35. Bugbot is set up for automated code reviews on this repo. Configure here.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Setting memory_max to 1 TB (1024 * 1024 MB) in orchestrator.hcl and template-manager.hcl defeats cgroup-based memory containment. If a task experiences a memory leak, it will exhaust the host's physical memory and trigger the kernel OOM killer to terminate critical system processes like Nomad or Consul, rather than allowing Nomad to contain and restart the failing task. These limits should be set to a reasonable value relative to the host size, such as 2048 MB.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread iac/modules/job-orchestrator/jobs/orchestrator.hcl
Comment thread iac/modules/job-template-manager/jobs/template-manager.hcl
@codecov

codecov Bot commented Jun 2, 2026

Copy link
Copy Markdown

❌ 6 Tests Failed:

Tests completed Failed Passed Skipped
2708 6 2702 5
View the full list of 6 ❄️ flaky test(s)
github.com/e2b-dev/infra/tests/integration/internal/tests/api/sandboxes::TestSandboxListPaginationRunningLargerLimit

Flake rate in main: 41.35% (Passed 824 times, Failed 581 times)

Stack Traces | 96.2s run time
=== RUN   TestSandboxListPaginationRunningLargerLimit
    sandbox_list_test.go:327: Created sandbox 1/12: iveudlw9nx4u2vlbtbhke
    sandbox_list_test.go:327: Created sandbox 2/12: ie7t6mso1ml4w0ehgw0id
    sandbox_list_test.go:327: Created sandbox 3/12: iiciewy37s7gt1aj2ons2
    sandbox_list_test.go:327: Created sandbox 4/12: iveh57tls26zsialm1qk9
    sandbox_list_test.go:327: Created sandbox 5/12: i28djulnoermgv4qzru1w
    sandbox_list_test.go:327: Created sandbox 6/12: iawbjp4j7p4h708bjmq7b
    sandbox_list_test.go:327: Created sandbox 7/12: ist9f4cirzoc75890ql1e
    sandbox_list_test.go:327: Created sandbox 8/12: izbwzhknip1twaeg4pagg
    sandbox_list_test.go:327: Created sandbox 9/12: i4wkrtrjlgqd6n786eeh4
    sandbox_list_test.go:327: Created sandbox 10/12: ig9x9pts4c3g95bmrcapj
    sandbox_list_test.go:327: Created sandbox 11/12: ism5f0k43ismoql6wgz0y
    sandbox_list_test.go:327: Created sandbox 12/12: i5rbu20jvtwzar53fzlig
    sandbox_list_test.go:330: 
        	Error Trace:	.../api/sandboxes/sandbox_list_test.go:340
        	            				.../hostedtoolcache/go/1.26.3.../src/runtime/asm_amd64.s:1771
        	Error:      	"[]" should have 12 item(s), but has 0
    sandbox_list_test.go:330: 
        	Error Trace:	.../api/sandboxes/sandbox_list_test.go:330
        	Error:      	Condition never satisfied
        	Test:       	TestSandboxListPaginationRunningLargerLimit
--- FAIL: TestSandboxListPaginationRunningLargerLimit (96.23s)
github.com/e2b-dev/infra/tests/integration/internal/tests/envd::TestCommandKillNextApp

Flake rate in main: 41.16% (Passed 812 times, Failed 568 times)

Stack Traces | 304s run time
=== RUN   TestCommandKillNextApp
=== PAUSE TestCommandKillNextApp
=== CONT  TestCommandKillNextApp
Executing command npx in sandbox ieiavm8ruq71m0lrqnng0
    process_test.go:28: Command [npx] output: event:{start:{pid:1265}}
    process_test.go:28: Command [npx] output: event:{data:{stderr:"npm"}}
    process_test.go:28: Command [npx] output: event:{data:{stderr:" WARN exec The following package was not found and will be installed: create-next-app@16.2.7\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"Creating a new Next.js app in .../home/user/nextapp.\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"Using npm.\n\nInitializing project with template: app-tw \n\n"}}
    process_test.go:28: Command [npx] output: event:{data:{stdout:"\nInstalling dependencies:\n- next\n- react\n- react-dom\n\nInstalling devDependencies:\n- @tailwindcss/postcss\n- @types/node\n- @types/react\n- @types/react-dom\n- eslint\n- eslint-config-next\n- tailwindcss\n- typescript\n\n"}}
    process_test.go:28: Command [npx] output: event:{keepalive:{}}
    process_test.go:28: Command [npx] output: event:{keepalive:{}}
    process_test.go:28: Command [npx] output: event:{keepalive:{}}
    process_test.go:29: 
        	Error Trace:	.../tests/envd/process_test.go:29
        	Error:      	Received unexpected error:
        	            	failed to execute command npx in sandbox ieiavm8ruq71m0lrqnng0: invalid_argument: protocol error: incomplete envelope: unexpected EOF
        	Test:       	TestCommandKillNextApp
--- FAIL: TestCommandKillNextApp (303.97s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity

Flake rate in main: 56.17% (Passed 817 times, Failed 1047 times)

Stack Traces | 62.4s run time
=== RUN   TestSandboxMemoryIntegrity
=== PAUSE TestSandboxMemoryIntegrity
=== CONT  TestSandboxMemoryIntegrity
    sandbox_memory_integrity_test.go:27: Build completed successfully
--- FAIL: TestSandboxMemoryIntegrity (62.43s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity/tmpfs_hash

Flake rate in main: 56.26% (Passed 807 times, Failed 1038 times)

Stack Traces | 192s run time
=== RUN   TestSandboxMemoryIntegrity/tmpfs_hash
=== PAUSE TestSandboxMemoryIntegrity/tmpfs_hash
=== CONT  TestSandboxMemoryIntegrity/tmpfs_hash
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{start:{pid:1261}}
Executing command bash in sandbox iqdo4fopxgsvcky6hm2wq (user: root)
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Total memory: 985 MB\nUsed memory before tmpfs mount: 195 MB\nFree memory before tmpfs mount: 789 MB\nMemory to use in integrity test (60% of free, min 64MB): 473 MB\n"}}
Executing command bash in sandbox iqdo4fopxgsvcky6hm2wq (user: root)
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"473+0 records in\n473+0 records out\n495976448 bytes (496 MB, 473 MiB) copied, 2.01162 s, 247 MB/s\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\tCommand being timed: \"dd if=/dev/urandom of=/mnt/testfile bs=1M count=473\"\n\tUser time (seconds): 0.00\n\tSystem time (seconds): 2.00\n\tPercent of CPU this job got: 99%\n\tElapsed (wall clock) time (h:mm:ss or m:ss): 0:02.01\n\tAverage shared text size (kbytes): 0\n\tAverage unshared data size (kbytes): 0\n\tAverage stack size (kbytes): 0\n\tAverage total size (kbytes): 0\n\tMaximum resident set size (kbytes): 2640\n\tAverage resident set size (kbytes): 0\n\tMajor (requiring I/O) page faults: 2\n\tMinor (reclaiming a frame) page faults: 342\n\tVoluntary context switches: 3\n\tInvoluntary context switches: 7\n\tSwaps: 0\n\tFile system inputs: 176\n\tFile system outputs: 0\n\tSocket messages sent: 0\n\tSocket messages received: 0\n\tSignals delivered: 0\n\tPage size (bytes): 4096\n\tExit status: 0\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Used memory after tmpfs mount and file fill: 671 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{end:{exited:true  status:"exit status 0"}}
    sandbox_memory_integrity_test.go:70: Command [bash] completed successfully in sandbox isxniim6vim7s0ic8c69t
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{start:{pid:1277}}
Executing command bash in sandbox iw1a43ydr3rb8xp3gou4z (user: root)
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{data:{stdout:"a912691870b5650c265676d03067d3d80e783ddbb77dcdbcc593640d30a63159\n"}}
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{end:{exited:true  status:"exit status 0"}}
    sandbox_memory_integrity_test.go:80: Command [bash] completed successfully in sandbox isxniim6vim7s0ic8c69t
Executing command bash in sandbox iw1a43ydr3rb8xp3gou4z (user: root)
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{start:{pid:1280}}
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
Executing command bash in sandbox isxniim6vim7s0ic8c69t (user: root)
    sandbox_memory_integrity_test.go:110: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:81
        	            				.../hostedtoolcache/go/1.26.3.../src/runtime/asm_amd64.s:1771
        	Error:      	Received unexpected error:
        	            	failed to execute command bash in sandbox isxniim6vim7s0ic8c69t: unavailable: HTTP status 502 Bad Gateway
    sandbox_memory_integrity_test.go:110: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:78
        	            				.../tests/orchestrator/sandbox_memory_integrity_test.go:110
        	Error:      	Condition never satisfied
        	Test:       	TestSandboxMemoryIntegrity/tmpfs_hash
--- FAIL: TestSandboxMemoryIntegrity/tmpfs_hash (192.17s)
github.com/e2b-dev/infra/tests/integration/internal/tests/proxies::TestEnvdAccessTokenAutoResumeViaProxy

Flake rate in main: 41.43% (Passed 813 times, Failed 575 times)

Stack Traces | 10.8s run time
=== RUN   TestEnvdAccessTokenAutoResumeViaProxy
=== PAUSE TestEnvdAccessTokenAutoResumeViaProxy
=== CONT  TestEnvdAccessTokenAutoResumeViaProxy
    traffic_access_token_test.go:357: 
        	Error Trace:	.../tests/proxies/traffic_access_token_test.go:357
        	Error:      	Received unexpected error:
        	            	Get "http://localhost:3002/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
        	Test:       	TestEnvdAccessTokenAutoResumeViaProxy
--- FAIL: TestEnvdAccessTokenAutoResumeViaProxy (10.76s)
github.com/e2b-dev/infra/tests/integration/internal/tests/proxies::TestSandboxAutoResumeViaProxy

Flake rate in main: 41.91% (Passed 811 times, Failed 585 times)

Stack Traces | 15s run time
=== RUN   TestSandboxAutoResumeViaProxy
=== PAUSE TestSandboxAutoResumeViaProxy
=== CONT  TestSandboxAutoResumeViaProxy
    auto_resume_test.go:116: 
        	Error Trace:	.../tests/proxies/auto_resume_test.go:116
        	Error:      	Received unexpected error:
        	            	Get "http://localhost:3002": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
        	Test:       	TestSandboxAutoResumeViaProxy
--- FAIL: TestSandboxAutoResumeViaProxy (15.04s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dfb1f35e46

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread iac/modules/job-orchestrator/jobs/orchestrator.hcl
@jakubno jakubno enabled auto-merge (squash) June 2, 2026 08:21
@sitole sitole changed the title Update nomad and consul Update cluster 1.8 (Nomad) and 1.17 (Consul) Jun 2, 2026
@sitole sitole changed the title Update cluster 1.8 (Nomad) and 1.17 (Consul) Update cluster to 1.8 (Nomad) and 1.17 (Consul) Jun 2, 2026
Comment thread iac/provider-gcp/nomad-cluster/scripts/run-nomad.sh
@jakubno jakubno merged commit 34dcd14 into main Jun 2, 2026
51 checks passed
@jakubno jakubno deleted the update-nomad-and-consul branch June 2, 2026 08:39
dobrac added a commit that referenced this pull request Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants