-
Notifications
You must be signed in to change notification settings - Fork 86
Design for virt qcow2 datamover #2037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: oadp-dev
Are you sure you want to change the base?
Conversation
WalkthroughAdds a new design proposal document describing a KubeVirt incremental datamover to integrate KubeVirt incremental backups with OADP/Velero, detailing goals, use cases, high-level design, data/metadata layout for qcow2 files, controller responsibilities, backup/restore flows, and open questions. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes 🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro Cache: Disabled due to data retention organization setting Knowledge base: Disabled due to 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (1)**⚙️ CodeRabbit configuration file
Files:
🪛 LanguageTooldocs/design/kubevirt-datamover.md[style] ~78-~78: Consider removing “of” to be more concise (ALL_OF_THE) [grammar] ~203-~203: Use a hyphen to join words. (QB_NEW_EN_HYPHEN) 🪛 markdownlint-cli2 (0.18.1)docs/design/kubevirt-datamover.md26-26: Fenced code blocks should have a language specified (MD040, fenced-code-language) 54-54: Inconsistent indentation for list items at the same level (MD005, list-indent) 54-54: Unordered list indentation (MD007, ul-indent) 54-54: Hard tabs (MD010, no-hard-tabs) 84-84: Inconsistent indentation for list items at the same level (MD005, list-indent) 84-84: Unordered list indentation (MD007, ul-indent) 84-84: Hard tabs (MD010, no-hard-tabs) 108-108: Fenced code blocks should have a language specified (MD040, fenced-code-language) 122-122: Fenced code blocks should have a language specified (MD040, fenced-code-language) 162-162: Fenced code blocks should have a language specified (MD040, fenced-code-language) 171-171: Fenced code blocks should have a language specified (MD040, fenced-code-language) 🔇 Additional comments (1)
✏️ Tip: You can disable this entire section by setting Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
docs/design/virt-qcow2-datamover.md (3)
1-25: Fix markdown formatting issues and correct spelling in intro section.Line 13 contains a bare URL; wrap it in angle brackets or use markdown link syntax. Lines 20–22 have incorrect list indentation (4 spaces instead of 2). Line 16 has a spelling error: "mantainability" should be "maintainability".
Apply these fixes:
- Implementation based on kubevirt enhancement defined at https://github.com/kubevirt/enhancements/blob/main/veps/sig-storage/incremental-backup.md + Implementation based on kubevirt enhancement defined at <https://github.com/kubevirt/enhancements/blob/main/veps/sig-storage/incremental-backup.md> - Deep integration with velero data mover pluggability (this could be considered in the long-term though, which would minimize duplication of effort and enhance mantainability) + Deep integration with velero data mover pluggability (this could be considered in the long-term though, which would minimize duplication of effort and enhance maintainability) - As a user I want to use OADP to trigger backups that will back up volume data using libvirt tooling rather than CSI snapshots - - Volume backups will be incremental when possible (first backup for a given volume will be full, subsequent backups for that same volume will be incremental) - - Based on the assumption that libvirt incremental volume backups should be much faster than CSI snapshots followed by incremental kopia snapshot copy, the expectation is that users might run libvirt-based OADP backups more frequently than they would for CSI-based backups. - - If users are backing up more frequently with this method, they should ensure that they are not using CSI snapshots or fs-backup via the resource/volume policy configuration in the backup. + As a user I want to use OADP to trigger backups that will back up volume data using libvirt tooling rather than CSI snapshots + - Volume backups will be incremental when possible (first backup for a given volume will be full, subsequent backups for that same volume will be incremental) + - Assuming libvirt incremental backups are faster than CSI snapshots with incremental kopia, users may run these backups more frequently. + - If users are backing up more frequently with this method, they should ensure that they are not using CSI snapshots or fs-backup via the resource/volume policy configuration in the backup.
82-98: Correct spelling errors and hyphenation in storage strategy section.Line 86 should use a hyphen: "top level" → "top-level" (compound adjective). Line 96 has a spelling error: "invidivual" should be "individual". Consider also tightening line 96–97 for clarity by replacing "I'm not sure we get any real benefits" with a more direct phrasing, and simplifying "on top of that" to "additionally" or "furthermore".
Apply these fixes:
- Create a top level dir in the BSL (under the BSL prefix, parallel to backups/restores/kopia) for libvirt datamover. + Create a top-level dir in the BSL (under the BSL prefix, parallel to backups/restores/kopia) for libvirt datamover. - We could use kopia on top of the object storage API, but I'm not sure we get any real benefits, since we're already working with files that represent just the data diff we need. We can just manage them as invidivual objects + We could use kopia on top of the object storage API. However, since we're already working with files representing just the data diff needed, we can manage them as individual objects - This will also require additional overhead around kopia maintenance, etc. and on top of that, we still may need to manage qcow2 file deletion manually. + This will also require additional overhead around kopia maintenance, and additionally, we still may need to manage qcow2 file deletion manually.
99-113: Correct spelling error in force full backup discussion.Line 108 has a spelling error: "datmover" should be "datamover". Also apply consistent 2-space list indentation throughout lines 101–104.
Apply this fix:
- For initial release, we can add a force-full-virt-backup annotation on the velero backup. Longer-term, we can push for a general datamover feature in velero which could force full backups for both fs-backup and velero datamover if backup.spec.forceFullVolumeBackup is true, and once implemented, the qcow2 datmover can use this as well. + For initial release, we can add a force-full-virt-backup annotation on the velero backup. Longer-term, we can push for a general datamover feature in velero which could force full backups for both fs-backup and velero datamover if backup.spec.forceFullVolumeBackup is true, and once implemented, the qcow2 datamover can use this as well.For the Open Questions section, ensure nested lists use 2-space indentation:
- ### Open questions - - How to determine PVC size? - - user-configurable? configmap or annotation? - - From the kubevirt enhancement: "Before the process begins, an estimation of the required backup size will be performed. If the provided PVC size is insufficient, an error will be returned" - - If the PVC is too small, we need a clear error on the backup indicating that it failed due to insufficient PVC space. - - If controller is responsible for PVC creation rather than plugin, then the controller may be able to respond to PVC too small errors by retrying with a larger PVC. + ### Open questions + - How to determine PVC size? + - user-configurable? configmap or annotation? + - From the kubevirt enhancement: "Before the process begins, an estimation of the required backup size will be performed. If the provided PVC size is insufficient, an error will be returned" + - If the PVC is too small, we need a clear error on the backup indicating that it failed due to insufficient PVC space. + - If controller is responsible for PVC creation rather than plugin, then the controller may be able to respond to PVC too small errors by retrying with a larger PVC.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (1)
docs/design/virt-qcow2-datamover.md(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
docs/design/virt-qcow2-datamover.md
🪛 LanguageTool
docs/design/virt-qcow2-datamover.md
[style] ~21-~21: ‘Based on the assumption that’ might be wordy. Consider a shorter alternative.
Context: ... same volume will be incremental) - Based on the assumption that libvirt incremental volume backups shou...
(EN_WORDINESS_PREMIUM_BASED_ON_THE_ASSUMPTION_THAT)
[grammar] ~31-~31: Ensure spelling is correct
Context: ...ckupItemAction/RestoreItemAction plugins - VitualMachineInstance BIA plugin - This plugin will check...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~41-~41: Ensure spelling is correct
Context: ...ill be added to the PVC identifying the VMInstance we're backing up. - This PVC probab...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~75-~75: Ensure spelling is correct
Context: ...ackup and incremental files from object store, repeatedly call `qemu-img rebase -b fu...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~80-~80: Ensure spelling is correct
Context: ...ored the VM disks, it will exit and the VMInstance can launch with these disks (following ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~86-~86: Use a hyphen to join words.
Context: ...ence/object-store.go` - Create a top level dir in the BSL (under the BSL pref...
(QB_NEW_EN_HYPHEN)
[style] ~96-~96: Consider adding a conjunction to make this sound more formal.
Context: ... of the object storage API, but I'm not sure we get any real benefits, since we're alre...
(NOT_SURE_IT_WORKS)
[grammar] ~96-~96: Ensure spelling is correct
Context: ...iff we need. We can just manage them as invidivual objects - This will also requir...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~96-~96: Ensure spelling is correct
Context: ...n just manage them as invidivual objects - This will also require additional overhe...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[style] ~97-~97: ‘on top of that’ might be wordy. Consider a shorter alternative.
Context: ...head around kopia maintenance, etc. and on top of that, we still may need to manage qcow2 file...
(EN_WORDINESS_PREMIUM_ON_TOP_OF_THAT)
[style] ~106-~106: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ld have more than one snapshot file. We need to handle this mapping, including the orde...
(REP_NEED_TO_VB)
[grammar] ~108-~108: Ensure spelling is correct
Context: ...s true, and once implemented, the qcow2 datmover can use this as well. ### General note...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 markdownlint-cli2 (0.18.1)
docs/design/virt-qcow2-datamover.md
13-13: Bare URL used
(MD034, no-bare-urls)
20-20: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
21-21: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
22-22: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
33-33: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
34-34: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
35-35: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
36-36: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
37-37: Unordered list indentation
Expected: 4; Actual: 8
(MD007, ul-indent)
38-38: Unordered list indentation
Expected: 4; Actual: 8
(MD007, ul-indent)
39-39: Unordered list indentation
Expected: 4; Actual: 8
(MD007, ul-indent)
40-40: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
41-41: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
42-42: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
43-43: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
44-44: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
45-45: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
46-46: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
48-48: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
49-49: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
50-50: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
55-55: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
56-56: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
57-57: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
58-58: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
59-59: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
60-60: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
61-61: Unordered list indentation
Expected: 4; Actual: 8
(MD007, ul-indent)
62-62: Unordered list indentation
Expected: 4; Actual: 8
(MD007, ul-indent)
63-63: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
63-63: Link fragments should be valid
Expected: #wherehow-to-store-qcow2-files-and-metadata; Actual: #Wherehow-to-store-qcow2-files-and-metadata
(MD051, link-fragments)
64-64: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
64-64: Link fragments should be valid
Expected: #wherehow-to-store-qcow2-files-and-metadata; Actual: #Wherehow-to-store-qcow2-files-and-metadata
(MD051, link-fragments)
65-65: Unordered list indentation
Expected: 4; Actual: 8
(MD007, ul-indent)
66-66: Unordered list indentation
Expected: 4; Actual: 8
(MD007, ul-indent)
67-67: Unordered list indentation
Expected: 4; Actual: 8
(MD007, ul-indent)
69-69: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
70-70: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
71-71: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
72-72: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
73-73: Unordered list indentation
Expected: 4; Actual: 8
(MD007, ul-indent)
74-74: Unordered list indentation
Expected: 4; Actual: 8
(MD007, ul-indent)
75-75: Unordered list indentation
Expected: 4; Actual: 8
(MD007, ul-indent)
76-76: Unordered list indentation
Expected: 4; Actual: 8
(MD007, ul-indent)
77-77: Unordered list indentation
Expected: 4; Actual: 8
(MD007, ul-indent)
78-78: Unordered list indentation
Expected: 4; Actual: 8
(MD007, ul-indent)
79-79: Unordered list indentation
Expected: 4; Actual: 8
(MD007, ul-indent)
80-80: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
85-85: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
86-86: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
87-87: Unordered list indentation
Expected: 4; Actual: 8
(MD007, ul-indent)
88-88: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
89-89: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
91-91: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
92-92: Unordered list indentation
Expected: 4; Actual: 8
(MD007, ul-indent)
93-93: Unordered list indentation
Expected: 4; Actual: 8
(MD007, ul-indent)
94-94: Unordered list indentation
Expected: 4; Actual: 8
(MD007, ul-indent)
95-95: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
96-96: Unordered list indentation
Expected: 4; Actual: 8
(MD007, ul-indent)
97-97: Unordered list indentation
Expected: 4; Actual: 8
(MD007, ul-indent)
101-101: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
102-102: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
103-103: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
104-104: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
43551b2 to
b64475c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (1)
docs/design/virt-qcow2-datamover.md(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
docs/design/virt-qcow2-datamover.md
🪛 LanguageTool
docs/design/virt-qcow2-datamover.md
[grammar] ~77-~77: Ensure spelling is correct
Context: ...ackup and incremental files from object store, repeatedly call `qemu-img rebase -b fu...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[style] ~108-~108: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ld have more than one snapshot file. We need to handle this mapping, including the orde...
(REP_NEED_TO_VB)
🪛 markdownlint-cli2 (0.18.1)
docs/design/virt-qcow2-datamover.md
13-13: Bare URL used
(MD034, no-bare-urls)
14-14: Bare URL used
(MD034, no-bare-urls)
b64475c to
46a2119
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
docs/design/virt-qcow2-datamover.md (1)
29-32: Be explicit about YAML keys vs Go struct fields to avoid implementation drift
The doc uses Go-style (Spec.DataMover,Spec.SnapshotType) in prose; implementers may copy these into manifests/controllers incorrectly. Recommend standardizing on manifest-style (e.g.,spec.dataMover,spec.snapshotType,spec.snapshotMoveData) and adding a one-time mapping note to Go fields.Also applies to: 46-48, 55-56, 113-113
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (1)
docs/design/virt-qcow2-datamover.md(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
docs/design/virt-qcow2-datamover.md
🪛 LanguageTool
docs/design/virt-qcow2-datamover.md
[grammar] ~77-~77: Ensure spelling is correct
Context: ...ackup and incremental files from object store, repeatedly call `qemu-img rebase -b fu...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[style] ~108-~108: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ld have more than one snapshot file. We need to handle this mapping, including the orde...
(REP_NEED_TO_VB)
46a2119 to
276101e
Compare
|
/test unit-test |
|
/test unit-test |
shubham-pampattiwar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great design, @sseago ! Excited to see this move forward, qcow2 incremental backups will be a big win for VM backup performance. Left a few comments with questions and suggestions.
| - Assuming that kubevirt incremental volume backups should be much faster than CSI snapshots followed by incremental kopia snapshot copy, the expectation is that users might run kubevirt datamover backups more frequently than they would for CSI-based backups. | ||
| - If users are backing up more frequently with this method, they should ensure that they are not using CSI snapshots or fs-backup via the resource/volume policy configuration in the backup. | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The design might benefit from something like a comparison table for CSI DM vs QCOW2 DM (just a thought):
Key Differences
┌────────────────────┬──────────────────────────────────────────┬────────────────────────────────────────────┐
│ Aspect │ CSI Approach │ KubeVirt qcow2 Approach │
├────────────────────┼──────────────────────────────────────────┼────────────────────────────────────────────┤
│ Layer │ Storage (CSI driver) │ Hypervisor (QEMU/libvirt) │
├────────────────────┼──────────────────────────────────────────┼────────────────────────────────────────────┤
│ Snapshot mechanism │ CSI VolumeSnapshot │ VirtualMachineBackup CR │
├────────────────────┼──────────────────────────────────────────┼────────────────────────────────────────────┤
│ Incremental │ Kopia deduplication (scans whole volume) │ True block-level CBT (only changed blocks) │
├────────────────────┼──────────────────────────────────────────┼────────────────────────────────────────────┤
│ Data mover │ Velero's node-agent + kopia │ New OADP controller + qemu-img │
├────────────────────┼──────────────────────────────────────────┼────────────────────────────────────────────┤
│ VM awareness │ None (just sees PVC) │ Full (knows it's a VM disk) │
└────────────────────┴──────────────────────────────────────────┴────────────────────────────────────────────┘
| - Using kopia | ||
| - We could use kopia on top of the object storage API, but it is not clear that this will provide any real benefits, since we're already working with files that represent just the data diff we need. We can just manage them as individual objects. | ||
| - This will also require additional overhead around kopia maintenance, and we still may need to manage qcow2 file deletion manually. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The backup and restore flows involve multiple components (BIA/RIA plugins, DataUpload/DataDownload, controller, VirtualMachineBackup CR, datamover pods). Visual diagrams would help readers follow the sequence. Here are suggested mermaid diagrams(please let me know if they are accurate or change them as needed to add into the design doc):
- Backup Flow Diagram:
sequenceDiagram
participant Velero
participant VMI_BIA as VMI BIA Plugin
participant DU as DataUpload CR
participant Controller as Kubevirt Datamover Controller
participant VMBackup as VirtualMachineBackup CR
participant KubeVirt
participant TempPVC as Temporary PVC
participant Pod as Datamover Pod
participant BSL as Object Storage (BSL)
Velero->>VMI_BIA: Backup VirtualMachineInstance
VMI_BIA->>VMI_BIA: Check CBT enabled & volume policies
VMI_BIA->>TempPVC: Create temporary PVC
VMI_BIA->>DU: Create DataUpload (DataMover: "kubevirt")
Controller->>DU: Reconcile DataUpload
Controller->>VMBackup: Create VirtualMachineBackup CR
VMBackup->>KubeVirt: Trigger backup
KubeVirt->>KubeVirt: Extract changed blocks via libvirt/QEMU
KubeVirt->>TempPVC: Write qcow2 file(s)
KubeVirt->>VMBackup: Update status: Complete
Controller->>Pod: Launch datamover pod
Pod->>TempPVC: Mount & read qcow2 files
Pod->>BSL: Upload qcow2 + metadata
Pod->>Controller: Exit success
Controller->>DU: Update status: Complete
- Restore Flow Diagram
sequenceDiagram
participant Velero
participant VMI_RIA as VMI RIA Plugin
participant DD as DataDownload CR
participant Controller as Kubevirt Datamover Controller
participant TempPVC as Temporary PVC
participant TargetPVC as Target PVC(s)
participant Pod as Datamover Pod
participant BSL as Object Storage (BSL)
Velero->>VMI_RIA: Restore VirtualMachineInstance
VMI_RIA->>TempPVC: Create temporary PVC
VMI_RIA->>DD: Create DataDownload (DataMover: "kubevirt")
Controller->>DD: Reconcile DataDownload
Controller->>TargetPVC: Create target PVC(s) for VM disks
Controller->>Pod: Launch datamover pod
Pod->>BSL: Download qcow2 files (full + incrementals)
Pod->>TempPVC: Write qcow2 files
Pod->>Pod: qemu-img rebase (chain incrementals)
Pod->>Pod: qemu-img convert -f qcow2 -O raw
Pod->>TargetPVC: Write raw disk image
Pod->>Controller: Exit success
Controller->>DD: Update status: Complete
Controller->>TempPVC: Delete temporary PVC
Note over TargetPVC: VM can now boot with restored disk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weshayutin has created a backup flow diagram on a wiki page. We can add a similar restore one -- comparing with these to make sure we've gotten it all right.
| - Once datamover pod has restored the VM disks, it will exit and the VirtualMachineInstance can launch with these disks (following the velero datamover model where the temporary PVC is deleted, retaining the PV, which then binds to the VM's PVCs may work here). The temporary PVC (containing the qcow2 images, not the restored VM disk image) should be completely removed at this point, including PV content. | ||
|
|
||
|
|
||
| ### Where/how to store qcow2 files and metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the storage path in s3 be something like:
<bsl-prefix>/kubevirt-datamover/
├── manifests/
│ └── <velero-backup-name>.json # Per-backup manifest
└── checkpoints/
└── <namespace>/
└── <vmi-name>/
├── <checkpoint-id>.qcow2 # Actual qcow2 files
└── index.json # Per-VMI checkpoint chain
Or organized differently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't be directly under bsl-prefix since Velero complains if there are non-velero dirs there. We'd probably need to combine <bsl-prefix> and kubevirt-datamover into a single directory name: <bsl-prefix>-kubevirt or something similar. But yes, something like this. However, we also need to make sure we capture this per-backup. i.e. for a given vmi, backup1 will include checkpoint1.qcow2, backup2 will have checkppoint1.qcow2 and checkpoint2.qcow2 (in that order). But maybe that's all captured in index.json. Although we may need to have index.json which references a list of backups and then .json with that backup's list of checkpoints and qcow2 files. Also, checkpoint name and qcow2 filename may not be guaranteed to be the same, in hich case the various json files will need to make the connection explicit. The json files also probably need to list the PVCs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, So some of my suggestions above are handled better by your later examples.
docs/design/kubevirt-datamover.md
Outdated
| - Use the Velero object store plugin API but not the velero-specific logic in `persistence/object-store.go` | ||
| - Create a top-level dir in the BSL (under the BSL prefix, parallel to backups/restores/kopia) for kubevirt datamover. | ||
| - Actually, this may have to be outside the prefix (i.e. if prefix is "data" then we may need to create a parallel dir "data-kubevirt" or something similar, since I think Velero allows only its own files under the prefix) | ||
| - Copy individual qcow2 files and metadata files identifying the required qcow2 checkpoints. We may want a subdir per VirtualMachineInstance for qcow2 files. For metadata files, these should probably be organized by velero backup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What fields are in the metadata files? How do we map Velero backups to their required checkpoints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, lets say we have Backup A and Backup B
Backup A: uses checkpoints 1, 2, 3
Backup B: uses checkpoints 1, 2, 3, 4, 5
Delete Backup A → can't delete 1, 2, 3 (still needed by B)
Delete Backup B → now can delete 4, 5, and also 1, 2, 3
How is this reference counting implemented? We will have to store this mapping and We would need some structure for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added suggestion: #2037 (comment) and #2037 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comments above where you reference the dir structure. The vmi->backup(s)->checkpoints connections will definitely need to be part of the definition of what files we store and what's in them.
| - Launch kubevirt datamover pod mounting the temporary PVC with the qcow2 file(s) from the backup. | ||
| - This pod needs to be running a command that will do the datamover operation from pvc to object storage | ||
| - Will there be multiple qcow2 files if there are multiple disks with CBT enabled? Will these all use the same checkpoint name? | ||
| - Copy the new file to object storage (see [Where to store qcow2 files](#wherehow-to-store-qcow2-files-and-metadata) below) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will the pod receive BSL configuration and credentials? For example, will the controller inject the BSL details via environment variables and mount the credential secret?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be similar to what datamover pods do I'd imagine. I haven't looked at that recently, but presumably DU links to Backup which then lists BSL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yes, mounting/accessing the credentials is something we can probably handle in the same way we do for datamover and fs-backup pods upstream.
docs/design/kubevirt-datamover.md
Outdated
| - Use the Velero object store plugin API but not the velero-specific logic in `persistence/object-store.go` | ||
| - Create a top-level dir in the BSL (under the BSL prefix, parallel to backups/restores/kopia) for kubevirt datamover. | ||
| - Actually, this may have to be outside the prefix (i.e. if prefix is "data" then we may need to create a parallel dir "data-kubevirt" or something similar, since I think Velero allows only its own files under the prefix) | ||
| - Copy individual qcow2 files and metadata files identifying the required qcow2 checkpoints. We may want a subdir per VirtualMachineInstance for qcow2 files. For metadata files, these should probably be organized by velero backup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sseago I am thinking something like this:
- Per-VMI Index (
checkpoints/<ns>/<vmi>/index.json):
{
"vmiName": "my-vm",
"namespace": "default",
"pvcs": ["my-vm-disk-0", "my-vm-disk-1"],
"checkpoints": [
{
"id": "cp-001",
"type": "full",
"timestamp": "2025-01-10T10:00:00Z",
"files": ["cp-001-disk0.qcow2", "cp-001-disk1.qcow2"],
"referencedBy": ["backup-2025-01-10"]
},
{
"id": "cp-002",
"type": "incremental",
"parent": "cp-001",
"timestamp": "2025-01-11T10:00:00Z",
"files": ["cp-002-disk0.qcow2", "cp-002-disk1.qcow2"],
"referencedBy": ["backup-2025-01-11"]
},
{
"id": "cp-003",
"type": "incremental",
"parent": "cp-002",
"timestamp": "2025-01-12T10:00:00Z",
"files": ["cp-003-disk0.qcow2", "cp-003-disk1.qcow2"],
"referencedBy": ["backup-2025-01-12"]
}
]
}- Per-Backup Manifest (
manifests/<backup-name>.json):
{
"backupName": "backup-2025-01-12",
"timestamp": "2025-01-12T10:00:00Z",
"vmis": [
{
"namespace": "default",
"name": "my-vm",
"checkpointChain": ["cp-001", "cp-002", "cp-003"]
}
]
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When deleting backup-2025-01-11:
- Read its manifest to find referenced checkpoints (cp-001, cp-002)
- For each checkpoint, remove this backup from referencedBy in the VMI index
- If referencedBy becomes empty AND no later checkpoint depends on it as parent, delete the qcow2 files
- Delete the backup manifest
This handles the reference counting problem - cp-001 and cp-002 would still be referenced by backup-2025-01-12 (since it needs the full chain to restore).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shubham-pampattiwar yes, I think something like this works. If we're managing the references in both directions, this makes restore easier (using checkpointChain), and also deletion (using checkpointChain and referencedBy)
|
@kaovilai @mpryc @shubham-pampattiwar ( nice comments ) keep pushing on this until we're ready to merge. Thanks! |
docs/design/kubevirt-datamover.md
Outdated
|
|
||
| ### BackupItemAction/RestoreItemAction plugins | ||
| - VirtualMachineInstance BIA plugin | ||
| - This plugin will check to see whether QEMU backup is enabled in any of the disks -- whether `Spec.Domain.Devices.Disk[i].CBT` is `true` for at least one disk. If none are enabled, then the plugin exits without action (question: is this still valid, or is it replaced by the next?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is
status.ChangedBlockTracking = Enabled should tell me if the VMI is using CBT, as stated in the VEP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Joeavaikath The line below this has exactly what you propose. The question is whether this is needed at all, or needed in addition to the other. But I see no reference to this in the VEP now. I suspect this was left over from an earlier iteration of the proposal. I'll remove it.
a6d2fef to
9336ca8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🤖 Fix all issues with AI agents
In `@docs/design/kubevirt-datamover.md`:
- Around line 66-67: Update the design doc to remove ambiguity by adding a clear
responsibility matrix that maps each operation (temporary PVC creation,
DataDownload (DD) creation, cleanup, and any DU-related config handling) to the
owning component (RIA plugin vs controller) for both backup and restore flows;
explicitly state whether the RIA plugin or the controller is responsible for
creating temp PVCs and for creating DDs (and reference existing DU RIA
behavior), and document handoff points, error-handling ownership, and lifecycle
triggers to prevent duplicate work or gaps.
- Around line 186-199: The VirtualMachine BIA plugin currently doesn't enforce
the required SnapshotMoveData=true prerequisite; add an explicit validation at
the start of the plugin's BackupItemAction handling (e.g., in the plugin's
Validate/Execute/Run method for the VirtualMachine BIA plugin referenced around
lines 50-62) that inspects backup.Spec.SnapshotMoveData and immediately return a
clear, non-retriable error (and log a descriptive message) if it is false so the
backup fails fast instead of letting the datamover controller encounter
downstream failures.
- Around line 85-97: The restore section under "DataDownload reconciler
(restore)" lacks concrete design and must be expanded: define error handling and
retry semantics for the datamover pod/qemu-img steps (include failure modes,
backoff, and when to abort), specify PVC sizing rules and validation for
restored VM disks (how to compute required size and quota checks), add integrity
verification (e.g., checksums or qemu-img check) after each rebase/convert and
before writing to the VM PVC, design rollback/cleanup procedures for partial
failures (how to handle multi-disk VMs, atomicity expectations, PV/PVC retention
or trashing), and include explicit success/failure criteria and a fallback path
if the kubevirt API enhancement is unavailable (document manual qemu-img
invocation and automation steps). Ensure references to the datamover pod,
qemu-img rebase/convert sequence, temporary PVC lifecycle, and VirtualMachine
binding behavior are updated with these details.
- Around line 99-167: Add explicit concurrency and transactional rules: use
optimistic concurrency on Per-VM index.json (include a version/etag field and
require test-and-set/conditional writes against the object store when updating
referencedBy), implement a two-phase commit for checkpoints (phase 1: upload
qcow2 files under checkpoints/<ns>/<vm-name>/<checkpoint-id>/tmp and write a
tentative checkpoint record; phase 2: atomically rename/commit the checkpoint
and then CAS-update the index.json checkpointChain/referencedBy), and provide a
per-VM lock fallback (object-store lock object or Kubernetes ConfigMap/Lease)
for retries on CAS failures to avoid corruption when two backups run
concurrently; add a garbage collector that scans checkpoints and index.json to
detect orphaned qcow2 files (cleanup after N successful CAS failures or after
tombstone marker + grace period) and document recovery for partial writes
(tombstone/rollback markers, retry logic, and idempotent commit/abort paths for
manifests in manifests/<backup-name>.json and checkpoints/*/index.json).
- Around line 69-84: The design for the Kubevirt Datamover Controller is missing
required security and operational specs; update the design doc to add concrete
requirements: enumerate ServiceAccount and RBAC rules the controller needs (for
creating VirtualMachineBackup, managing PVCs, reading BSL), define pod
securityContext defaults for the controller and datamover pod (runAsNonRoot,
seccomp, dropped capabilities), specify how datamover pods obtain BSL
credentials securely (projected Secret/CSI driver or mounted ServiceAccount
token), and add resource requests/limits for both controller and datamover
images; also add operational details: timeouts for QEMU/VMBackup operations, pod
retry/eviction/restart handling and backoff policy for the datamover pod,
logging/observability requirements (structured logs, metrics, events) and a
clear aborted-backup recovery path for
VirtualMachineBackup/VirtualMachineBackupTracker reconciliation.
🧹 Nitpick comments (1)
docs/design/kubevirt-datamover.md (1)
26-41: Add language identifier to code block for proper rendering.The comparison table code block at line 26 should specify a language identifier (e.g.,
textor leave it empty but use proper markdown) to ensure consistent rendering across different markdown viewers.📝 Suggested fix
-``` +```text Key Differences between CSI and KubeVirt incremental snapshots
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (1)
docs/design/kubevirt-datamover.md
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
docs/design/kubevirt-datamover.md
🪛 LanguageTool
docs/design/kubevirt-datamover.md
[style] ~74-~74: Consider removing “of” to be more concise
Context: ...checkpoint for this VM and confirm that all of the related qcow2 files are available. - ...
(ALL_OF_THE)
🪛 markdownlint-cli2 (0.18.1)
docs/design/kubevirt-datamover.md
26-26: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
54-54: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 1
(MD005, list-indent)
54-54: Unordered list indentation
Expected: 4; Actual: 1
(MD007, ul-indent)
54-54: Hard tabs
Column: 1
(MD010, no-hard-tabs)
80-80: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 1
(MD005, list-indent)
80-80: Unordered list indentation
Expected: 4; Actual: 1
(MD007, ul-indent)
80-80: Hard tabs
Column: 1
(MD010, no-hard-tabs)
104-104: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
116-116: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
153-153: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| ### Kubevirt Datamover Controller | ||
| - Responsible for reconciling DataUploads/DataDownloads where `Spec.DataMover` is "kubevirt" | ||
| - DataUpload reconciler (backup): | ||
| - create the (temporary) PVC. | ||
| - identify the VirtualMachine from the PVC metadata. | ||
| - From the BSL, identify the latest checkpoint for this VM and confirm that all of the related qcow2 files are available. | ||
| - create/update the VirtualMachineBackupTracker CR with `source` set to the VirtualMachine, and `status.latestCheckpoint.name` set to match the latest checkpoint we have for this vm. Note that it might be better if we could avoid hacking the VMBT status and instead have a field on the VM -- something like `spec.overrideCheckpoint` to use a specific checkpoint if the one in the VMBT doesn't match the latest we have in the BSL. | ||
| - create the VirtualMachineBackup CR with `source` set to the VirtualMachineBackupTracker, `pvcName` set to the temporary PVC, and (optionally)`forceFullBackup`, set to `true` to force a full backup | ||
| - Wait for VMBackup to complete (monitoring status) | ||
| - Launch kubevirt datamover pod mounting the temporary PVC with the qcow2 file(s) from the backup. | ||
| - This pod needs to be running a command that will do the datamover operation from pvc to object storage | ||
| - The datamover pod functionality should be built into the same image as the kubevirt-datamover-controller pod image. | ||
| - Copy the new file to object storage (see [Where to store qcow2 files](#wherehow-to-store-qcow2-files-and-metadata) below) | ||
| - Save any required metadata to identify the stored data (collection of qcow2 pathnames/checkpoints, etc.), along with identifying the backup and VirtualMachine they're associated with. Save this metadata file as well (see [Where to store qcow2 files](#wherehow-to-store-qcow2-files-and-metadata) below) | ||
| - We need to properly handle cases where we attempt an incremental backup but a full backup is taken instead (checkpoint lost, CSI snapshot restore since last checkpoint, VM restart, etc.) | ||
| - Aborted backups also need to be handled (resulting in a failed PVC backup on the Velero side) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cat -n docs/design/kubevirt-datamover.mdRepository: openshift/oadp-operator
Length of output: 17618
Address critical security and operational gaps in controller and datamover pod specifications.
The design lacks essential implementation details that must be resolved before development:
Security requirements:
- Service accounts and RBAC permissions for controller operations (creating VirtualMachineBackup CRs, managing PVCs, accessing BSL)
- Pod security context standards (runAsNonRoot, seccomp, capabilities)
- Mechanism for datamover pods to securely access BSL credentials
- Resource limits/requests to prevent resource exhaustion attacks
Operational requirements:
- Timeout specifications for long-running QEMU operations
- Pod failure recovery strategy (handling eviction, crashes during backup/restore)
- Logging and observability approach for debugging failed operations
- Detailed handling of aborted backups (currently acknowledged at lines 83-84 but unresolved)
These specifications are prerequisites for a secure, production-ready deployment and should be added to the design before implementation begins.
🧰 Tools
🪛 LanguageTool
[style] ~74-~74: Consider removing “of” to be more concise
Context: ...checkpoint for this VM and confirm that all of the related qcow2 files are available. - ...
(ALL_OF_THE)
🪛 markdownlint-cli2 (0.18.1)
80-80: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 1
(MD005, list-indent)
80-80: Unordered list indentation
Expected: 4; Actual: 1
(MD007, ul-indent)
80-80: Hard tabs
Column: 1
(MD010, no-hard-tabs)
🤖 Prompt for AI Agents
In `@docs/design/kubevirt-datamover.md` around lines 69 - 84, The design for the
Kubevirt Datamover Controller is missing required security and operational
specs; update the design doc to add concrete requirements: enumerate
ServiceAccount and RBAC rules the controller needs (for creating
VirtualMachineBackup, managing PVCs, reading BSL), define pod securityContext
defaults for the controller and datamover pod (runAsNonRoot, seccomp, dropped
capabilities), specify how datamover pods obtain BSL credentials securely
(projected Secret/CSI driver or mounted ServiceAccount token), and add resource
requests/limits for both controller and datamover images; also add operational
details: timeouts for QEMU/VMBackup operations, pod retry/eviction/restart
handling and backoff policy for the datamover pod, logging/observability
requirements (structured logs, metrics, events) and a clear aborted-backup
recovery path for VirtualMachineBackup/VirtualMachineBackupTracker
reconciliation.
| - DataDownload reconciler (restore) | ||
| - (this area is less well defined so far, since the kubevirt enhancement doesn't go into as much detail on restore) | ||
| - We will need a temporary PVC for pulling qcow2 images from object store (if we're restoring the empty temp PVC from backup, that might work here) | ||
| - We also need PVCs created for each VM disk we're restoring from qcow2 images. | ||
| - We'll need to create another datamover pod here which will do the following: | ||
| - pod will have temp PVC mounted, as well as PVCs mounted for each vm disk we're creating. | ||
| - pod running command/image will first get the list of qcow2 files to pull from object storage | ||
| - once we have the qcow2 full backup and incremental files from object store, repeatedly call `qemu-img rebase -b fullbackup.qcow2 -f qcow2 -u incrementalN.qcow2` for each incremental backup, in order | ||
| - Then, convert qcow2 image to raw disk image: `qemu-img convert -f qcow2 -O raw fullbackup.qcow2 restored-raw.img` | ||
| - Finally, write this image to the PVC which contains the VM disk | ||
| - (repeat process for each disk if VM has multiple disks to restore) | ||
| - Note that the various `qemu-img` actions might eventually be combined into a single kubevirt API call, but for the moment this would need to be done manually. | ||
| - Once datamover pod has restored the VM disks, it will exit and the VirtualMachine can launch with these disks (following the velero datamover model where the temporary PVC is deleted, retaining the PV, which then binds to the VM's PVCs may work here). The temporary PVC (containing the qcow2 images, not the restored VM disk image) should be completely removed at this point, including PV content. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restore flow requires significantly more design detail before implementation.
Lines 86-97 acknowledge the restore flow is "less well defined" and contains several uncertainties:
- Manual execution of multiple qemu-img commands without error recovery strategy
- Unclear how PVC sizing is determined for restored VM disks
- Dependency on future kubevirt API changes (line 96) without fallback
- No specification of data validation/integrity checks after restore
- No handling of partial restore failures (e.g., multi-disk VM where only some disks restore successfully)
This level of design ambiguity in a critical recovery path presents significant operational risk. Recommend completing the restore flow design with: error handling strategy, integrity verification steps, rollback procedures, and concrete success/failure criteria before implementation begins.
🤖 Prompt for AI Agents
In `@docs/design/kubevirt-datamover.md` around lines 85 - 97, The restore section
under "DataDownload reconciler (restore)" lacks concrete design and must be
expanded: define error handling and retry semantics for the datamover
pod/qemu-img steps (include failure modes, backoff, and when to abort), specify
PVC sizing rules and validation for restored VM disks (how to compute required
size and quota checks), add integrity verification (e.g., checksums or qemu-img
check) after each rebase/convert and before writing to the VM PVC, design
rollback/cleanup procedures for partial failures (how to handle multi-disk VMs,
atomicity expectations, PV/PVC retention or trashing), and include explicit
success/failure criteria and a fallback path if the kubevirt API enhancement is
unavailable (document manual qemu-img invocation and automation steps). Ensure
references to the datamover pod, qemu-img rebase/convert sequence, temporary PVC
lifecycle, and VirtualMachine binding behavior are updated with these details.
| ### Kubevirt datamover backup data/metadata | ||
|
|
||
| The kubevirt datamover data will be stored in the BSL using a prefix derived from `BSLPrefix+"-kubevirt-datamover"`. | ||
|
|
||
| The directory structure will be as follows: | ||
| ``` | ||
| <bsl-prefix>-kubevirt-datamover/ | ||
| ├── manifests/ | ||
| │ └── <velero-backup-name>.json # Per-backup manifest | ||
| └── checkpoints/ | ||
| └── <namespace>/ | ||
| └── <vm-name>/ | ||
| ├── <checkpoint-id>/ # checkpoint dir | ||
| ├── └── <vmb-name>-<disk-name>.qcow2 # Actual qcow2 files | ||
| └── index.json # Per-VM manifest | ||
| ``` | ||
| Example of a Per-VM Index file: | ||
| ``` | ||
| Per-VM Index (checkpoints/<ns>/<vm-name>/index.json): | ||
| { | ||
| "vmName": "my-vm", | ||
| "namespace": "default", | ||
| "checkpoints": [ | ||
| { | ||
| "id": "cp-001", | ||
| "type": "full", | ||
| "timestamp": "2025-01-10T10:00:00Z", | ||
| "files": ["vmb-001-disk0.qcow2", "vmb-001-disk1.qcow2"], | ||
| "pvcs": ["my-vm-pvc-0", "my-vm-pvc-1"], | ||
| "referencedBy": ["backup-2025-01-10", "backup-2025-01-11", "backup-2025-01-12"] | ||
| }, | ||
| { | ||
| "id": "cp-002", | ||
| "type": "incremental", | ||
| "parent": "cp-001", | ||
| "timestamp": "2025-01-11T10:00:00Z", | ||
| "files": ["vmb-002-disk0.qcow2", "vmb-002-disk1.qcow2"], | ||
| "pvcs": ["my-vm-pvc-0", "my-vm-pvc-1"], | ||
| "referencedBy": ["backup-2025-01-11", "backup-2025-01-12"] | ||
| }, | ||
| { | ||
| "id": "cp-003", | ||
| "type": "incremental", | ||
| "parent": "cp-002", | ||
| "timestamp": "2025-01-12T10:00:00Z", | ||
| "files": ["vmb-003-disk0.qcow2", "vmb-003-disk1.qcow2"], | ||
| "pvcs": ["my-vm-pvc-0", "my-vm-pvc-1"], | ||
| "referencedBy": ["backup-2025-01-12"] | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
| Example of a per-backup manifest: | ||
| ``` | ||
| Per-Backup Manifest (manifests/<backup-name>.json): | ||
| { | ||
| "backupName": "backup-2025-01-12", | ||
| "timestamp": "2025-01-12T10:00:00Z", | ||
| "vms": [ | ||
| { | ||
| "namespace": "default", | ||
| "name": "my-vm", | ||
| "checkpointChain": ["cp-001", "cp-002", "cp-003"] | ||
| } | ||
| ] | ||
| } | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define concurrency control and transactional semantics for metadata operations.
The metadata schema (index.json per VM, manifest per backup) supports reference counting but the design lacks critical operational details:
Concurrency concerns:
- What happens if two backups of the same VM run simultaneously? (Could corrupt index.json)
- How is write contention handled for updating referencedBy arrays during deletion?
- Is there a locking mechanism or do we rely on object store eventual consistency?
Transactional integrity:
- What happens if the controller crashes between uploading qcow2 files and updating index.json?
- How are orphaned qcow2 files detected and cleaned up?
- Is there a two-phase commit pattern (upload data, then atomically commit metadata)?
- What's the recovery procedure for partially written manifests?
Without these guarantees, backup/restore correctness is at risk, especially during failures or concurrent operations. Recommend specifying: locking strategy (optimistic with retries vs distributed lock), atomic metadata updates (test-and-set or versioning), and orphan detection/cleanup procedures.
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
104-104: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
116-116: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
153-153: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In `@docs/design/kubevirt-datamover.md` around lines 99 - 167, Add explicit
concurrency and transactional rules: use optimistic concurrency on Per-VM
index.json (include a version/etag field and require test-and-set/conditional
writes against the object store when updating referencedBy), implement a
two-phase commit for checkpoints (phase 1: upload qcow2 files under
checkpoints/<ns>/<vm-name>/<checkpoint-id>/tmp and write a tentative checkpoint
record; phase 2: atomically rename/commit the checkpoint and then CAS-update the
index.json checkpointChain/referencedBy), and provide a per-VM lock fallback
(object-store lock object or Kubernetes ConfigMap/Lease) for retries on CAS
failures to avoid corruption when two backups run concurrently; add a garbage
collector that scans checkpoints and index.json to detect orphaned qcow2 files
(cleanup after N successful CAS failures or after tombstone marker + grace
period) and document recovery for partial writes (tombstone/rollback markers,
retry logic, and idempotent commit/abort paths for manifests in
manifests/<backup-name>.json and checkpoints/*/index.json).
| ### Open questions | ||
| - How to determine PVC size? | ||
| - user-configurable? configmap or annotation? | ||
| - From the kubevirt enhancement: "Before the process begins, an estimation of the required backup size will be performed. If the provided PVC size is insufficient, an error will be returned" | ||
| - If the PVC is too small, we need a clear error on the backup indicating that it failed due to insufficient PVC space. | ||
| - Since controller is responsible for PVC creation rather than plugin, the controller may be able to respond to PVC too small errors by retrying with a larger PVC. | ||
| - The kubevirt datamover controller will be responsible for deleting the `VirtualMachineBackup` resource once it's no longer needed. When should this happen? Upon velero backup deletion? This would enable debugging in the case of failed operations. If we delete it immediately, that will make troubleshooting more difficult. If on backup deletion, we'll need to write a `DeleteItemAction` plugin. | ||
| - Do we need an option to force full backups? If we're always doing incremental, eventually the incremental backup list becomes really long, requiring applying possibly hundreds of incremental files for a single restore. | ||
| - For initial release, we can add a force-full-virt-backup annotation on the velero backup. Longer-term, we can push for a general datamover feature in velero which could force full backups for both fs-backup and velero datamover if backup.Spec.ForceFullVolumeBackup is true, and once implemented, the qcow2 datamover can use this as well. | ||
|
|
||
| ### General notes | ||
| - SnapshotMoveData must be true on the backup or DU/DD processing won't work properly | ||
| - Longer-term, we can probably eliminate some of the custom code in the new controller by refactoring the velero datamover pluggability features, allowing the node agent to orchestrate this (with a custom image and configuration for datamover pods, etc.) | ||
| - The kubevirt enhancement references both "push mode" and "pull mode" -- initial implementation on the kubevirt side will be push mode only. This OADP proposal is also push mode only for the initial implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical prerequisite validation missing from plugin design.
Line 197 states "SnapshotMoveData must be true on the backup" but this critical prerequisite is buried in general notes rather than enforced in the VirtualMachine BIA plugin (lines 50-62). If SnapshotMoveData is false, the entire datamover flow will fail, yet the design doesn't show where this validation occurs.
Recommend adding explicit validation in the VirtualMachine BIA plugin (around line 54) to check SnapshotMoveData and fail fast with a clear error message if false, rather than allowing the backup to proceed only to fail later in the controller.
🤖 Prompt for AI Agents
In `@docs/design/kubevirt-datamover.md` around lines 186 - 199, The VirtualMachine
BIA plugin currently doesn't enforce the required SnapshotMoveData=true
prerequisite; add an explicit validation at the start of the plugin's
BackupItemAction handling (e.g., in the plugin's Validate/Execute/Run method for
the VirtualMachine BIA plugin referenced around lines 50-62) that inspects
backup.Spec.SnapshotMoveData and immediately return a clear, non-retriable error
(and log a descriptive message) if it is false so the backup fails fast instead
of letting the datamover controller encounter downstream failures.
Major operational concerns missing from open questions.
The open questions section addresses PVC sizing and force-full backups but omits several critical operational concerns:
Backup retention and lifecycle:
- How do backup retention policies interact with qcow2 checkpoint chains? (If incremental backup N expires before N+1, can we still restore N+1?)
- What happens when backup TTL expires but checkpoints are still referenced by newer backups?
Disaster recovery:
- If the BSL is corrupted or lost, how are the metadata files (index.json, manifests) recovered?
- Is there a secondary copy or audit trail of metadata?
Cross-cluster restore:
- Can backups be restored to a different cluster? (Different storage classes, different kubevirt versions?)
- Are there kubevirt version compatibility requirements for qcow2 format?
Upgrade path:
- What happens when kubevirt or qemu-img versions change? (Backward/forward compatibility of qcow2 format and metadata schema?)
Testing and validation:
- How will the complex multi-component integration be tested? (E2E scenarios, failure injection, etc.)
These gaps represent significant operational and reliability risks for production deployments.
| - How to determine PVC size? | ||
| - user-configurable? configmap or annotation? | ||
| - From the kubevirt enhancement: "Before the process begins, an estimation of the required backup size will be performed. If the provided PVC size is insufficient, an error will be returned" | ||
| - If the PVC is too small, we need a clear error on the backup indicating that it failed due to insufficient PVC space. | ||
| - Since controller is responsible for PVC creation rather than plugin, the controller may be able to respond to PVC too small errors by retrying with a larger PVC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PVC resize retry mechanism needs bounds and backoff strategy.
Line 191 suggests the controller "may be able to respond to PVC too small errors by retrying with larger PVC" but doesn't specify:
- Maximum PVC size limit (to prevent unbounded storage consumption)
- Resize increment strategy (how much larger on each retry?)
- Maximum number of retry attempts
- Backoff strategy between retries
Without these constraints, the retry loop could consume excessive storage or cause rapid failure loops. Recommend specifying concrete bounds and exponential backoff before implementation.
9336ca8 to
43d3773
Compare
|
@sseago: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: shubham-pampattiwar, sseago The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| - Note that unlike the built-in datamover, this DataUpload is tied to a VM (which could include multiple PVCs) rather than to a single PVC. | ||
| - An annotation will be added to the DataUpload identifying the VirtualMachine we're backing up. | ||
| - Add `velerov1api.DataUploadNameAnnotation` to VirtualMachine | ||
| - OperationID will be created and returned similar to what's done with the CSI PVC plugin, and the async operation Progress method will report on progress based on the DU status (similar to CSI PVC plugin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to improve my understanding, this would be:
KubeVirt VM plugin working at the VM level possibly hitting multiple PVCs
analogous to the existing
CSI PVC plugin which is PVC specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. In both cases, we have a plugin which creates a DataUpload and then in its async Progress method, monitors the DU state until it's done.
| - create the (temporary) PVC. | ||
| - identify the VirtualMachine from the PVC metadata. | ||
| - From the BSL, identify the latest checkpoint for this VM and confirm that all of the related qcow2 files are available. | ||
| - create/update the VirtualMachineBackupTracker CR with `source` set to the VirtualMachine, and `status.latestCheckpoint.name` set to match the latest checkpoint we have for this vm. Note that it might be better if we could avoid hacking the VMBT status and instead have a field on the VM -- something like `spec.overrideCheckpoint` to use a specific checkpoint if the one in the VMBT doesn't match the latest we have in the BSL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the first ever backup, we create VirtualMachineBackupTracker (VMBT) with source set to a VirtualMachine resource, and on backup completion status.latestCheckpoint is updated by KubeVirt.
VMB uses it, usual ops follow
On backup completion, we cleanup the VMB resource
Generally, for a VMB backup, before we create a VMBT:
- Check if VMBT for the VM exists (labels: oadp, vm resource name)
- If the checkpoint data matches BSL (or the spec.overrideCheckpoint field, do we assume this will be the right value always)
- If not, update to ensure it is correct
- If this update somehow fails (checkpoint missing), do a full backup
- If not, update to ensure it is correct
Remaining ops proceed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right, with one open question in my mind:
"On backup completion, we cleanup the VMB resource"
Probably? It's also possible that we'll want to keep it around and clean it up upon backup deletion (similar to what we do for PodVolumeBackup, DataUpload, etc.) -- depends on whether we think VMB will ever be useful for debug purposes.
| ## High-Level design | ||
|
|
||
| ### Upstream velero changes | ||
| - Update velero volume policy model to allow unrecognized volume policy actions. Velero would treat unrecognized actions as "skip" (i.e. no snapshot or fs-backup), but the kubevirt datamover could only act if the policy action is "kubevirt". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we created an issue for this in velero? Do you think this will get pushed back? ie. we only allow recognizable policy actions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issue created upstream yet. The issue and PR for this is one of the tasks I'll be working on this week. And yes, pushback wouldn't completely surprise me. Hopefully we can convince the other maintainers that this is a good thing to allow for 3rd party plugins, etc. but we may have to carry the commit in our fork if we are unsuccessful there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a strong argument here. Our goal is to make Velero more portable and usable for third-party datamovers, and we deliberately want to avoid introducing custom, vendor-specific logic like the vSphere-specific code that exists in some parts of Velero today: https://github.com/search?q=repo%3Avmware-tanzu%2Fvelero%20vsphere&type=code
|
@mpryc @kaovilai @Joeavaikath if time and duties align, please try to have your final thoughts and reviews in by tomorrow. In scrum @sseago will be walking the team through OADP (view) and expanding, explaining, breaking up issues into smaller tasks. Thanks all 0/ |
Why the changes were made
This PR adds a design document for the VM qcow2 incremental backup feature.
Background:
How to test the changes made