Skip to content

perf(orch): free guest memory incrementally during snapshot copy#2902

Draft
bchalios wants to merge 3 commits into
mainfrom
direct-memfd-populate
Draft

perf(orch): free guest memory incrementally during snapshot copy#2902
bchalios wants to merge 3 commits into
mainfrom
direct-memfd-populate

Conversation

@bchalios
Copy link
Copy Markdown
Contributor

@bchalios bchalios commented Jun 2, 2026

Summary

On pause we copy guest memory from the memfd into the snapshot cache while the memfd is still fully resident, holding ~2× guest memory for the duration. This adds the option to punch each range out of the memfd (MADV_REMOVE) right after it's copied, keeping peak at ~1×. Gated behind a new flag, off by default.

Commits

  1. Memfd.punchHole — MADV_REMOVE over a huge-page-aligned range (misalignment is a caller bug → error; no clear() fallback since the mapping is PROT_READ).
  2. memfd-punch-on-snapshot flag — BoolFlag, default false.
  3. Wiring — thread punchSource through copyFromMemfd → NewCacheFromMemfd/Async → ExportMemory → pauseProcessMemory; Pause reads the flag.

Scope

  • Memfd path only (the process_vm_readv path has nothing to punch).
  • Non-dedup path only (dedup is page-granular, can't punch at huge-page granularity).

⚠️ Safety

The punch is destructive — it zeroes the pages for every mapping of the fd, including Firecracker's. A failed copy leaves both cache and memfd unusable, so it's only correct on the pause-and-discard path. Before enabling, confirm a failed pause discards the sandbox rather than resuming it. Rollback safety is enforced by convention, not code.

@cla-bot cla-bot Bot added the cla-signed label Jun 2, 2026
@cursor
Copy link
Copy Markdown

cursor Bot commented Jun 2, 2026

PR Summary

High Risk
Destructive guest memory changes with no code guard against resume after punch; misaligned dirty ranges vs mandatory 2 MiB punch granularity can break snapshot export when the flag is on.

Overview
Adds optional memfd-punch-on-snapshot (default off) so during pause, each dirty guest range can be copied from the memfd into the snapshot cache and then released with MADV_REMOVE, aiming to cap peak memory near one copy of guest RAM instead of two while the copy runs. Memfd.punchHole enforces 2 MiB alignment; the flag is wired through sync and async memfd export and is skipped on memfile dedup and non-memfd export paths.

Memfd.punchHole requires 2 MiB-aligned offsets and lengths, but punching uses the same BitsetRanges spans as the copy, which are only guaranteed to match diff block size (often 4 KiB), so enabling the flag may fail snapshots unless ranges are huge-page-aligned. Punching is destructive to all mappings of the memfd; a partial copy can leave guest memory and cache inconsistent, and rollback after punch is not safe—only appropriate when pause always discards the VM.

Reviewed by Cursor Bugbot for commit 569570a. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The implementation of punchHole incorrectly hardcodes header.HugepageSize for alignment checks and lacks bounds checking on the slice operation, which can cause snapshot copy failures on non-hugepage sandboxes and lead to runtime panics. Consequently, the call to punchHole in copyFromMemfd must be updated to pass the actual block size parameter to ensure correctness and safety.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +86 to +92
func (m *Memfd) punchHole(off, length int64) error {
if off%header.HugepageSize != 0 || length%header.HugepageSize != 0 {
return fmt.Errorf("punch range [%d,%d) not %d-aligned", off, off+length, header.HugepageSize)
}

return unix.Madvise(m.mmap[off:off+length], unix.MADV_REMOVE)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current implementation of punchHole hardcodes header.HugepageSize for alignment checks and lacks bounds checking on the slice operation. This will cause the snapshot copy to fail on non-hugepage sandboxes where the block size is 4 KiB, and can lead to runtime panics if the range is out of bounds. Accepting the actual page size as a parameter and validating the slice boundaries ensures both correctness and safety.

Suggested change
func (m *Memfd) punchHole(off, length int64) error {
if off%header.HugepageSize != 0 || length%header.HugepageSize != 0 {
return fmt.Errorf("punch range [%d,%d) not %d-aligned", off, off+length, header.HugepageSize)
}
return unix.Madvise(m.mmap[off:off+length], unix.MADV_REMOVE)
}
func (m *Memfd) punchHole(off, length, pageSize int64) error {
if off < 0 || length < 0 || off+length > int64(len(m.mmap)) {
return fmt.Errorf("punch range [%d,%d) out of bounds (size %d)", off, off+length, len(m.mmap))
}
if off%pageSize != 0 || length%pageSize != 0 {
return fmt.Errorf("punch range [%d,%d) not %d-aligned", off, off+length, pageSize)
}
return unix.Madvise(m.mmap[off:off+length], unix.MADV_REMOVE)
}

Comment on lines +150 to +154
if punchSource {
if err := memfd.punchHole(r.Start, r.Size); err != nil {
return fmt.Errorf("punch memfd source [%d,%d): %w", r.Start, r.Start+r.Size, err)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Update the call to punchHole to pass the blockSize parameter, which represents the actual page size of the memfd, matching the updated method signature.

Suggested change
if punchSource {
if err := memfd.punchHole(r.Start, r.Size); err != nil {
return fmt.Errorf("punch memfd source [%d,%d): %w", r.Start, r.Start+r.Size, err)
}
}
if punchSource {
if err := memfd.punchHole(r.Start, r.Size, blockSize); err != nil {
return fmt.Errorf("punch memfd source [%d,%d): %w", r.Start, r.Start+r.Size, err)
}
}

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 21e8968fead17683d77d70ff85ffbf8d09941ae8. Configure here.

if err := memfd.punchHole(r.Start, r.Size); err != nil {
return fmt.Errorf("punch memfd source [%d,%d): %w", r.Start, r.Start+r.Size, err)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Punch misaligned for 4KiB blocks

Medium Severity

With memfd-punch-on-snapshot enabled, copyFromMemfd calls punchHole using each BitsetRanges offset and size at diffMetadata.BlockSize. When the sandbox uses 4 KiB guest pages, those ranges are only 4 KiB-aligned, so punchHole rejects them and pause fails even though the copy path works.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 21e8968fead17683d77d70ff85ffbf8d09941ae8. Configure here.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 2, 2026

❌ 3 Tests Failed:

Tests completed Failed Passed Skipped
2706 3 2703 7
View the top 1 failed test(s) by shortest run time
github.com/e2b-dev/infra/packages/shared/pkg/storage::TestMultipartUploader_HighConcurrency_StressTest
Stack Traces | 0.62s run time
=== RUN   TestMultipartUploader_HighConcurrency_StressTest
=== PAUSE TestMultipartUploader_HighConcurrency_StressTest
=== CONT  TestMultipartUploader_HighConcurrency_StressTest
    gcp_multipart_test.go:398: 
        	Error Trace:	.../pkg/storage/gcp_multipart_test.go:398
        	Error:      	"1" is not greater than "1"
        	Test:       	TestMultipartUploader_HighConcurrency_StressTest
        	Messages:   	Should have concurrent uploads
--- FAIL: TestMultipartUploader_HighConcurrency_StressTest (0.62s)
View the full list of 2 ❄️ flaky test(s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity

Flake rate in main: 55.79% (Passed 836 times, Failed 1055 times)

Stack Traces | 67.7s run time
=== RUN   TestSandboxMemoryIntegrity
=== PAUSE TestSandboxMemoryIntegrity
=== CONT  TestSandboxMemoryIntegrity
    sandbox_memory_integrity_test.go:27: Build completed successfully
--- FAIL: TestSandboxMemoryIntegrity (67.74s)
github.com/e2b-dev/infra/tests/integration/internal/tests/orchestrator::TestSandboxMemoryIntegrity/tmpfs_hash

Flake rate in main: 55.88% (Passed 826 times, Failed 1046 times)

Stack Traces | 202s run time
=== RUN   TestSandboxMemoryIntegrity/tmpfs_hash
=== PAUSE TestSandboxMemoryIntegrity/tmpfs_hash
=== CONT  TestSandboxMemoryIntegrity/tmpfs_hash
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{start:{pid:1258}}
Executing command bash in sandbox iwr9jfv93u19hrz5rd1xc (user: root)
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Total memory: 985 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Used memory before tmpfs mount: 189 MB\nFree memory before tmpfs mount: 795 MB\nMemory to use in integrity test (60% of free, min 64MB): 477 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"477+0 records in\n477+0 records out\n500170752 bytes (500 MB, 477 MiB) copied, 2.15975 s, 232 MB/s\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stderr:"\tCommand being timed: \"dd if=/dev/urandom of=/mnt/testfile bs=1M count=477\"\n\tUser time (seconds): 0.00\n\tSystem time (seconds): 2.12\n\tPercent of CPU this job got: 97%\n\tElapsed (wall clock) time (h:mm:ss or m:ss): 0:02.17\n\tAverage shared text size (kbytes): 0\n\tAverage unshared data size (kbytes): 0\n\tAverage stack size (kbytes): 0\n\tAverage total size (kbytes): 0\n\tMaximum resident set size (kbytes): 2736\n\tAverage resident set size (kbytes): 0\n\tMajor (requiring I/O) page faults: 3\n\tMinor (reclaiming a frame) page faults: 346\n\tVoluntary context switches: 4\n\tInvoluntary context switches: 5\n\tSwaps: 0\n\tFile system inputs: 176\n\tFile system outputs: 0\n\tSocket messages sent: 0\n\tSocket messages received: 0\n\tSignals delivered: 0\n\tPage size (bytes): 4096\n\tExit status: 0\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{data:{stdout:"Used memory after tmpfs mount and file fill: 672 MB\n"}}
    sandbox_memory_integrity_test.go:70: Command [bash] output: event:{end:{exited:true  status:"exit status 0"}}
    sandbox_memory_integrity_test.go:70: Command [bash] completed successfully in sandbox i18uqv7jitcupdrhucbjb
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{start:{pid:1274}}
Executing command bash in sandbox it6sfqk19kqut6r8gdic5 (user: root)
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{data:{stdout:"66913880a0dac47ccf966b5d6fb4fe422f7db6f8bc71b05d08c4559d8fb34276\n"}}
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{end:{exited:true  status:"exit status 0"}}
    sandbox_memory_integrity_test.go:80: Command [bash] completed successfully in sandbox i18uqv7jitcupdrhucbjb
    sandbox_memory_integrity_test.go:80: Command [bash] output: event:{start:{pid:1277}}
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
Executing command bash in sandbox i18uqv7jitcupdrhucbjb (user: root)
    sandbox_memory_integrity_test.go:110: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:81
        	            				.../hostedtoolcache/go/1.26.3.../src/runtime/asm_amd64.s:1771
        	Error:      	Received unexpected error:
        	            	failed to execute command bash in sandbox i18uqv7jitcupdrhucbjb: unavailable: HTTP status 502 Bad Gateway
    sandbox_memory_integrity_test.go:110: 
        	Error Trace:	.../tests/orchestrator/sandbox_memory_integrity_test.go:78
        	            				.../tests/orchestrator/sandbox_memory_integrity_test.go:110
        	Error:      	Condition never satisfied
        	Test:       	TestSandboxMemoryIntegrity/tmpfs_hash
--- FAIL: TestSandboxMemoryIntegrity/tmpfs_hash (202.12s)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

bchalios and others added 3 commits June 2, 2026 21:29
Add a method that releases the backing pages of a memfd range via
MADV_REMOVE, freeing the shmem/hugetlbfs backing store like a hole-punch.

This is the primitive for incrementally freeing guest memory during a
snapshot copy: once a dirty range has been read into the cache, its huge
pages can be released so peak (source + destination) memory stays ~1x
instead of ~2x for large sandboxes.

The range must be huge-page aligned (hugetlbfs rejects sub-hugepage
ranges); callers iterate block-aligned dirty ranges, so a misaligned
range returns an error as a programming-error signal. The mapping is
PROT_READ so there is intentionally no clear() fallback.

Not wired up yet; that lands in a later commit behind a feature flag.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
Add a boolean flag gating incremental freeing of guest memory from the
memfd during a snapshot copy. Default off. Only takes effect when
use-memfd is on, and the wiring (next commit) additionally restricts it
to the pause-and-discard path since the punch is destructive to the
guest pages.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
Wire memfd-punch-on-snapshot through the snapshot copy path. When the
flag is on, copyFromMemfd frees each guest range from the memfd via
MADV_REMOVE right after copying it into the cache, so peak (source +
destination) memory during pause stays ~1x instead of ~2x for large
sandboxes.

Threaded punchSource through NewCacheFromMemfd / NewCacheFromMemfdAsync,
ExportMemory, and pauseProcessMemory; Pause computes it from the flag.
The page-granular dedup path does not support incremental punching and
ignores the flag.

The punch is destructive to guest memory, so it must only run when the
VM is committed to teardown: a failed copy leaves both the cache and the
memfd unusable. The flag is off by default; enabling it requires that a
failed pause discards the sandbox rather than resuming it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
@bchalios bchalios force-pushed the direct-memfd-populate branch from 21e8968 to 569570a Compare June 2, 2026 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant