Skip to content

KVM: fix UEFI disk-only instance snapshot NVRAM handling#13020

Open
Kunalbehbud wants to merge 4 commits intoapache:4.22from
Kunalbehbud:fix/kvm-uefi-disk-only-instance-snapshot-nvram-4.22
Open

KVM: fix UEFI disk-only instance snapshot NVRAM handling#13020
Kunalbehbud wants to merge 4 commits intoapache:4.22from
Kunalbehbud:fix/kvm-uefi-disk-only-instance-snapshot-nvram-4.22

Conversation

@Kunalbehbud
Copy link
Copy Markdown

Description

KVM disk-only instance snapshots do not capture the active UEFI NVRAM state, which makes revert unsafe for UEFI guests.

This PR fixes the KVM disk-only instance snapshot flow to:

  • plumb optional NVRAM sidecar metadata through the management and KVM agent commands
  • copy the active UEFI NVRAM file during snapshot creation and restore it during revert
  • clean up the NVRAM sidecar during delete and merge flows
  • gate create, revert, and sidecar cleanup on host UEFI/NVRAM capabilities and clear stale capability details on reconnect
  • suspend UEFI guests before copying NVRAM, freezing filesystems first when quiescevm=true
  • preserve successful snapshot metadata if post-snapshot thaw/resume fails, surfacing the issue via warning/alert instead of discarding the snapshot
  • reject reverting UEFI disk-only snapshots that do not contain NVRAM state

Older UEFI disk-only snapshots created without an NVRAM sidecar are intentionally rejected on revert.

This PR only covers disk-only instance snapshots for KVM UEFI VMs. snapshotMemory=true / internal snapshots remain out of scope.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

N/A

How Has This Been Tested?

  • mvn -pl engine/orchestration,engine/storage/snapshot,plugins/hypervisors/kvm -am -Dtest=AgentManagerImplTest,KvmFileBasedStorageVmSnapshotStrategyTest,LibvirtDiskOnlyVMSnapshotCommandWrapperTest -Dsurefire.failIfNoSpecifiedTests=false test
  • Live-tested on a 4.22.1 KVM environment with a UEFI VM:
    • created a disk-only instance snapshot
    • reverted the snapshot and booted the VM successfully
    • deleted a parent snapshot and verified the merge path removed the old NVRAM sidecar
    • corrupted the active NVRAM after the merge, reverted again, verified the active NVRAM checksum matched the snapshot sidecar before boot, and booted successfully

How did you try to break this feature and the system with this change?

  • tried reverting a UEFI disk-only snapshot without an NVRAM sidecar and verified the revert is rejected
  • covered stale host capability details on reconnect
  • covered the case where freeze verification fails after a successful freeze and verified cleanup still thaws the guest
  • covered post-snapshot thaw/resume failures and verified the snapshot metadata is preserved while the warning/alert path is exercised

@sureshanaparti
Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 53.06604% with 199 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.67%. Comparing base (0c86899) to head (43e9809).
⚠️ Report is 2 commits behind head on 4.22.

Files with missing lines Patch % Lines
...LibvirtCreateDiskOnlyVMSnapshotCommandWrapper.java 53.16% 60 Missing and 14 partials ⚠️
...napshot/KvmFileBasedStorageVmSnapshotStrategy.java 62.38% 30 Missing and 11 partials ⚠️
...LibvirtRevertDiskOnlyVMSnapshotCommandWrapper.java 55.00% 19 Missing and 8 partials ⚠️
...java/com/cloud/agent/manager/AgentManagerImpl.java 45.45% 5 Missing and 7 partials ⚠️
...t/api/storage/CreateDiskOnlyVmSnapshotCommand.java 0.00% 10 Missing ⚠️
...t/api/storage/DeleteDiskOnlyVmSnapshotCommand.java 38.46% 8 Missing ⚠️
...t/api/storage/RevertDiskOnlyVmSnapshotCommand.java 42.85% 8 Missing ⚠️
...LibvirtDeleteDiskOnlyVMSnapshotCommandWrapper.java 66.66% 6 Missing and 2 partials ⚠️
...ervisor/kvm/resource/LibvirtComputingResource.java 0.00% 7 Missing ⚠️
...nt/api/storage/CreateDiskOnlyVmSnapshotAnswer.java 33.33% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               4.22   #13020      +/-   ##
============================================
+ Coverage     17.62%   17.67%   +0.04%     
- Complexity    15723    15780      +57     
============================================
  Files          5921     5921              
  Lines        532841   533245     +404     
  Branches      65191    65230      +39     
============================================
+ Hits          93914    94228     +314     
- Misses       428310   428345      +35     
- Partials      10617    10672      +55     
Flag Coverage Δ
uitests 3.69% <ø> (ø)
unittests 18.74% <53.06%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17485

@winterhazel winterhazel requested a review from JoaoJandre April 14, 2026 14:10
@github-actions
Copy link
Copy Markdown

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

kunal.behbudzade added 3 commits April 14, 2026 20:46
Add the command payload, answer metadata, and host capability plumbing required for KVM disk-only instance snapshots to carry UEFI NVRAM state between management and the KVM agent.

Also synchronize host capability booleans on reconnect so stale UEFI/NVRAM support details are removed when an older agent reconnects.
Copy the active UEFI NVRAM file as a sidecar during disk-only instance snapshot creation, restore it on revert, and clean it up during delete and merge flows.

Also tighten host capability checks, preserve successful snapshot metadata when post-snapshot thaw or resume fails, and reject reverting UEFI disk-only snapshots that do not contain NVRAM state.
Cover host capability synchronization, UEFI NVRAM sidecar handling across create/revert/delete flows, and the running-VM recovery paths for disk-only instance snapshots.
@Kunalbehbud Kunalbehbud force-pushed the fix/kvm-uefi-disk-only-instance-snapshot-nvram-4.22 branch from f657823 to 2bc9051 Compare April 14, 2026 17:50
@Kunalbehbud
Copy link
Copy Markdown
Author

Rebased this branch on the latest 4.22 to resolve the merge conflict in AgentManagerImpl after the recent VDDK host detail changes.

Local verification passed again on top of the updated base:
mvn -pl engine/orchestration,engine/storage/snapshot,plugins/hypervisors/kvm -am -Dtest=AgentManagerImplTest,KvmFileBasedStorageVmSnapshotStrategyTest,LibvirtDiskOnlyVMSnapshotCommandWrapperTest -Dsurefire.failIfNoSpecifiedTests=false test

The new GitHub Actions runs for this fork push are currently in action_required, so they appear to be waiting for maintainer approval before re-running.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes KVM disk-only instance snapshots for UEFI VMs by capturing/restoring the active NVRAM state via an NVRAM “sidecar” file, and gating these flows on explicit host capabilities to prevent unsafe reverts.

Changes:

  • Plumbs UEFI/NVRAM metadata through management ↔ agent commands (create/revert/delete) and persists NVRAM sidecar path in snapshot details.
  • Implements NVRAM sidecar copy on snapshot creation and atomic restore on revert; adds sidecar cleanup on delete/merge flows.
  • Adds host capability advertising + reconnect-time capability syncing/cleanup, plus targeted unit tests for the new behavior.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
plugins/hypervisors/kvm/src/test/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtDiskOnlyVMSnapshotCommandWrapperTest.java New unit tests covering NVRAM sidecar backup/restore/cleanup and freeze/suspend sequencing/warnings.
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtCreateDiskOnlyVMSnapshotCommandWrapper.java Copies active NVRAM into a per-snapshot sidecar; adds freeze/suspend + post-snapshot recovery warnings.
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtRevertDiskOnlyVMSnapshotCommandWrapper.java Validates presence of NVRAM sidecar for UEFI reverts and atomically restores active NVRAM from the sidecar.
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtDeleteDiskOnlyVMSnapshotCommandWrapper.java Deletes NVRAM sidecar when present, using either provided primary datastore or root snapshot lookup.
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtReadyCommandWrapper.java Advertises the new host capability for NVRAM-aware disk-only snapshots.
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/LibvirtComputingResource.java Adds host detail advertising at startup and provides getUefiNvramPath(vmUuid) helper.
engine/storage/snapshot/src/test/java/org/apache/cloudstack/storage/vmsnapshot/KvmFileBasedStorageVmSnapshotStrategyTest.java New tests validating NVRAM path persistence, agent command plumbing, and capability gating.
engine/storage/snapshot/src/main/java/org/apache/cloudstack/storage/vmsnapshot/KvmFileBasedStorageVmSnapshotStrategy.java Persists NVRAM sidecar metadata, gates create/revert/delete/merge cleanup on capabilities, sends alert on recovery warnings.
engine/orchestration/src/test/java/com/cloud/agent/manager/AgentManagerImplTest.java Adds test ensuring stale NVRAM capability is cleared on reconnect when no longer advertised.
engine/orchestration/src/main/java/com/cloud/agent/manager/AgentManagerImpl.java Syncs boolean host capabilities from ReadyAnswer details, including removal of stale DB details when not advertised.
core/src/main/java/com/cloud/agent/api/storage/CreateDiskOnlyVmSnapshotCommand.java Adds vmUuid and uefiEnabled fields for agent-side NVRAM handling decisions.
core/src/main/java/com/cloud/agent/api/storage/CreateDiskOnlyVmSnapshotAnswer.java Adds nvramSnapshotPath to return sidecar path back to management.
core/src/main/java/com/cloud/agent/api/storage/RevertDiskOnlyVmSnapshotCommand.java Adds vmUuid, uefiEnabled, and nvramSnapshotPath to support validated/atomic NVRAM restores.
core/src/main/java/com/cloud/agent/api/storage/DeleteDiskOnlyVmSnapshotCommand.java Adds optional nvramSnapshotPath and primaryDataStore to support sidecar-only cleanup.
api/src/main/java/com/cloud/host/Host.java Introduces HOST_KVM_DISK_ONLY_VM_SNAPSHOT_NVRAM host capability key.
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtConvertInstanceCommandWrapper.java Fixes a typo in an error message (“calll” → “call”).
agent/conf/agent.properties Removes a trailing whitespace-only line.
PendingReleaseNotes Adds release note about UEFI disk-only snapshots now capturing NVRAM and legacy snapshot revert rejection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +316 to +325
protected void verifyVmFilesystemsFrozen(Domain domain, String vmName) throws LibvirtException, IOException {
String status = getResultOfQemuCommand(FreezeThawVMCommand.STATUS, domain);
if (StringUtils.isBlank(status) || !new JsonParser().parse(status).isJsonObject()) {
throw new IOException(String.format("Failed to verify VM [%s] filesystem freeze state before taking the disk-only VM snapshot. Result: %s", vmName, status));
}

String statusResult = new JsonParser().parse(status).getAsJsonObject().get("return").getAsString();
if (!FreezeThawVMCommand.FREEZE.equals(statusResult)) {
throw new IOException(String.format("Failed to freeze VM [%s] filesystems before taking the disk-only VM snapshot. Status: %s", vmName, statusResult));
}
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verifyVmFilesystemsFrozen assumes the guest-agent response is valid JSON with a string return field. JsonParser.parse(...), .get("return"), and .getAsString() can throw unchecked exceptions (e.g., JsonSyntaxException/IllegalStateException/NullPointerException) on malformed/unexpected responses, which will bypass the surrounding catch (LibvirtException | IOException) and can cause the agent command to fail without returning a proper Answer (and potentially leave the guest frozen until the finally recovery runs). Consider parsing defensively (e.g., catch runtime JSON parsing exceptions and rethrow as IOException), validating the presence/type of return, and treating JSON error responses explicitly as failures.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants