perf(orch): free guest memory incrementally during snapshot copy#2902
perf(orch): free guest memory incrementally during snapshot copy#2902bchalios wants to merge 3 commits into
Conversation
PR SummaryHigh Risk Overview Memfd.punchHole requires 2 MiB-aligned offsets and lengths, but punching uses the same BitsetRanges spans as the copy, which are only guaranteed to match diff block size (often 4 KiB), so enabling the flag may fail snapshots unless ranges are huge-page-aligned. Punching is destructive to all mappings of the memfd; a partial copy can leave guest memory and cache inconsistent, and rollback after punch is not safe—only appropriate when pause always discards the VM. Reviewed by Cursor Bugbot for commit 569570a. Bugbot is set up for automated code reviews on this repo. Configure here. |
There was a problem hiding this comment.
Code Review
The implementation of punchHole incorrectly hardcodes header.HugepageSize for alignment checks and lacks bounds checking on the slice operation, which can cause snapshot copy failures on non-hugepage sandboxes and lead to runtime panics. Consequently, the call to punchHole in copyFromMemfd must be updated to pass the actual block size parameter to ensure correctness and safety.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| func (m *Memfd) punchHole(off, length int64) error { | ||
| if off%header.HugepageSize != 0 || length%header.HugepageSize != 0 { | ||
| return fmt.Errorf("punch range [%d,%d) not %d-aligned", off, off+length, header.HugepageSize) | ||
| } | ||
|
|
||
| return unix.Madvise(m.mmap[off:off+length], unix.MADV_REMOVE) | ||
| } |
There was a problem hiding this comment.
The current implementation of punchHole hardcodes header.HugepageSize for alignment checks and lacks bounds checking on the slice operation. This will cause the snapshot copy to fail on non-hugepage sandboxes where the block size is 4 KiB, and can lead to runtime panics if the range is out of bounds. Accepting the actual page size as a parameter and validating the slice boundaries ensures both correctness and safety.
| func (m *Memfd) punchHole(off, length int64) error { | |
| if off%header.HugepageSize != 0 || length%header.HugepageSize != 0 { | |
| return fmt.Errorf("punch range [%d,%d) not %d-aligned", off, off+length, header.HugepageSize) | |
| } | |
| return unix.Madvise(m.mmap[off:off+length], unix.MADV_REMOVE) | |
| } | |
| func (m *Memfd) punchHole(off, length, pageSize int64) error { | |
| if off < 0 || length < 0 || off+length > int64(len(m.mmap)) { | |
| return fmt.Errorf("punch range [%d,%d) out of bounds (size %d)", off, off+length, len(m.mmap)) | |
| } | |
| if off%pageSize != 0 || length%pageSize != 0 { | |
| return fmt.Errorf("punch range [%d,%d) not %d-aligned", off, off+length, pageSize) | |
| } | |
| return unix.Madvise(m.mmap[off:off+length], unix.MADV_REMOVE) | |
| } |
| if punchSource { | ||
| if err := memfd.punchHole(r.Start, r.Size); err != nil { | ||
| return fmt.Errorf("punch memfd source [%d,%d): %w", r.Start, r.Start+r.Size, err) | ||
| } | ||
| } |
There was a problem hiding this comment.
Update the call to punchHole to pass the blockSize parameter, which represents the actual page size of the memfd, matching the updated method signature.
| if punchSource { | |
| if err := memfd.punchHole(r.Start, r.Size); err != nil { | |
| return fmt.Errorf("punch memfd source [%d,%d): %w", r.Start, r.Start+r.Size, err) | |
| } | |
| } | |
| if punchSource { | |
| if err := memfd.punchHole(r.Start, r.Size, blockSize); err != nil { | |
| return fmt.Errorf("punch memfd source [%d,%d): %w", r.Start, r.Start+r.Size, err) | |
| } | |
| } |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 21e8968fead17683d77d70ff85ffbf8d09941ae8. Configure here.
| if err := memfd.punchHole(r.Start, r.Size); err != nil { | ||
| return fmt.Errorf("punch memfd source [%d,%d): %w", r.Start, r.Start+r.Size, err) | ||
| } | ||
| } |
There was a problem hiding this comment.
Punch misaligned for 4KiB blocks
Medium Severity
With memfd-punch-on-snapshot enabled, copyFromMemfd calls punchHole using each BitsetRanges offset and size at diffMetadata.BlockSize. When the sandbox uses 4 KiB guest pages, those ranges are only 4 KiB-aligned, so punchHole rejects them and pause fails even though the copy path works.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 21e8968fead17683d77d70ff85ffbf8d09941ae8. Configure here.
❌ 3 Tests Failed:
View the top 1 failed test(s) by shortest run time
View the full list of 2 ❄️ flaky test(s)
To view more test analytics, go to the Test Analytics Dashboard |
Add a method that releases the backing pages of a memfd range via MADV_REMOVE, freeing the shmem/hugetlbfs backing store like a hole-punch. This is the primitive for incrementally freeing guest memory during a snapshot copy: once a dirty range has been read into the cache, its huge pages can be released so peak (source + destination) memory stays ~1x instead of ~2x for large sandboxes. The range must be huge-page aligned (hugetlbfs rejects sub-hugepage ranges); callers iterate block-aligned dirty ranges, so a misaligned range returns an error as a programming-error signal. The mapping is PROT_READ so there is intentionally no clear() fallback. Not wired up yet; that lands in a later commit behind a feature flag. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
Add a boolean flag gating incremental freeing of guest memory from the memfd during a snapshot copy. Default off. Only takes effect when use-memfd is on, and the wiring (next commit) additionally restricts it to the pause-and-discard path since the punch is destructive to the guest pages. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
Wire memfd-punch-on-snapshot through the snapshot copy path. When the flag is on, copyFromMemfd frees each guest range from the memfd via MADV_REMOVE right after copying it into the cache, so peak (source + destination) memory during pause stays ~1x instead of ~2x for large sandboxes. Threaded punchSource through NewCacheFromMemfd / NewCacheFromMemfdAsync, ExportMemory, and pauseProcessMemory; Pause computes it from the flag. The page-granular dedup path does not support incremental punching and ignores the flag. The punch is destructive to guest memory, so it must only run when the VM is committed to teardown: a failed copy leaves both the cache and the memfd unusable. The flag is off by default; enabling it requires that a failed pause discards the sandbox rather than resuming it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Babis Chalios <babis.chalios@e2b.dev>
21e8968 to
569570a
Compare


Summary
On pause we copy guest memory from the memfd into the snapshot cache while the memfd is still fully resident, holding ~2× guest memory for the duration. This adds the option to punch each range out of the memfd (MADV_REMOVE) right after it's copied, keeping peak at ~1×. Gated behind a new flag, off by default.
Commits
Scope
The punch is destructive — it zeroes the pages for every mapping of the fd, including Firecracker's. A failed copy leaves both cache and memfd unusable, so it's only correct on the pause-and-discard path. Before enabling, confirm a failed pause discards the sandbox rather than resuming it. Rollback safety is enforced by convention, not code.