Add support for Preservation of Machines and Backing nodes #1059

thiyyakat · 2025-12-10T09:32:40Z

What this PR does / why we need it:

This PR introduces a feature that allows operators and endusers to preserve a machine/node and the backing VM for diagnostic purposes.

The expected behaviour, use cases and usage are detailed in the proposal that can be found here

Which issue(s) this PR fixes:
Fixes #1008

Special notes for your reviewer:

The following tests were carried out serially with the machine-controller-manager-provider-virtual: #1059 (comment)

Please also take a look at the questions asked here.

Release note:

Introduce support for preservation of machines (both Running and Failed), and the backing node (if it exists).

gardener-robot · 2025-12-10T09:32:56Z

@thiyyakat You need rebase this pull request with latest master branch. Please check.

thiyyakat · 2025-12-11T06:59:24Z

Questions that remain unanswered:

On recovery of a preserved machine, it transitions from Failed to Running. However, if the preserve annotation was when-failed, then the node continues to be preserved in Running even though the annotation says when-failed - is that okay? The node needs to be preserved so that pods can get scheduled onto it without CA scaling it down.
drain timeout is checked currently by calculating time from LastUpdateTime (from when machine moved to Failed) to now. Is there a better way to do it?
timeOutOccurred = utiltime.HasTimeOutOccurred(machine.Status.CurrentStatus.LastUpdateTime, timeOutDuration)
In the normal drain, it is checked wrt DeletionTimestamp
In some parts of the code, checks are performed to see if the returned error is due to a Conflict, and ConflictRetry rather than ShortRetry is returned. When should these checks be performed? The preservation flow has a lot of update calls. : Addressed. Use ConflictRetry when appropriate.

thiyyakat

Note: A review meeting was held today for this PR. The comments were given during the meeting.

During the meeting, we revisited the decision to move drain to Failed state for preserved machine. The reason discussed previously was that it didn't make sense semantically to move the machine to Terminating and then do the drain, because there is a possibility that the machine may recover. Since Terminating is a final state, the drain (separate from the drain in triggerDeletionFlow) will be performed in Failed phase. There was no change proposed during the meeting. This design decision was only reconfirmed.

pkg/util/provider/machinecontroller/machine.go

pkg/util/provider/machinecontroller/machine_util.go

pkg/controller/machineset.go

machine-controller-manager

takoverflow

Have only gone through half of the PR, have some suggestions PTAL.

pkg/apis/machine/v1alpha1/machine_types.go

pkg/controller/deployment_machineset_util.go

pkg/controller/machineset.go

pkg/util/provider/machinecontroller/machine_util.go

takoverflow · 2025-12-18T09:40:52Z

pkg/util/provider/machinecontroller/machine_util.go

+		err := nodeops.AddOrUpdateConditionsOnNode(ctx, c.targetCoreClient, nodeName, preservedCondition)
+		if err != nil {
+			return err
+		}
+		// Step 2: remove CA's scale-down disabled annotations to allow CA to scale down node if needed
+		CAAnnotations := make(map[string]string)
+		CAAnnotations[autoscaler.ClusterAutoscalerScaleDownDisabledAnnotationKey] = ""
+		latestNode, err := c.targetCoreClient.CoreV1().Nodes().Get(ctx, nodeName, metav1.GetOptions{})
+		if err != nil {
+			klog.Errorf("error trying to get backing node %q for machine %s. Retrying, error: %v", nodeName, machine.Name, err)
+			return err
+		}
+		latestNodeCopy := latestNode.DeepCopy()
+		latestNodeCopy, _, _ = annotations.RemoveAnnotation(latestNodeCopy, CAAnnotations) // error can be ignored, always returns nil
+		_, err = c.targetCoreClient.CoreV1().Nodes().Update(ctx, latestNodeCopy, metav1.UpdateOptions{})
+		if err != nil {
+			klog.Errorf("Node UPDATE failed for node %q of machine %q. Retrying, error: %s", nodeName, machine.Name, err)
+			return err
+		}


Is there a reason why there are two get and update calls made for a node, can these not be combined into a single atomic node object update?

And I know this is not part of your PR but can we update this RemoveAnnotation function, it's needlessly complicated.
All you have to do after fetching the object and checking that annotations are non-nil is

delete(obj.Annotations, annotationKey)

Creating a dummy annotation map, then passing it and then creating a new map which doesn't have the key. All of this complication can be avoided.

By 2 Get() calls are you referring to the call within AddOrUpdateConditionsOnNode and the following Get() here:
latestNode, err := c.targetCoreClient.CoreV1().Nodes().Get(ctx, nodeName, metav1.GetOptions{})?

The first one can be avoided if we didn't use the function. The second one is required because step 1 adds conditions to the node object, and the function does not return the updated node object. Fetching from the cache doesn't guarantee an up-to-date node object (tested this out empirically). I could potentially avoid fetching the objects if I didn't use the function. Will test it out.

The two update calls cannot be combined since step 1 requires an UpdateStatus() call, and step 2 updates the Spec, and requires an Update() call.

I will update the RemoveAnnotation function as recommended by you.

Edit: The RemoveAnnotation function returns a boolean indicating whether or not an update is needed. This value is being used in other usages of the function. The function cannot be updated. I will use your suggestion instead of using the function since the boolean value is not required in this case.

pkg/util/provider/machinecontroller/machine_util.go

pkg/util/provider/machinecontroller/machine.go

# Conflicts: # pkg/util/provider/machinecontroller/machine.go # pkg/util/provider/machinecontroller/machine_util.go

# Conflicts: # pkg/util/provider/machinecontroller/machine.go

* remove use of machineStatusUpdate in machine preservation code since it uses a similarity check * introduce check of phase change in updateMachine() to initiate drain of preserved machine on failure. This check is only for preserved machines

pkg/util/provider/machineutils/utils.go

takoverflow

Just some minor comments, will take a proper look once the annotation logic's revised.

Makefile

pkg/controller/machineset.go

* remove auto preservation logic from manageReplicas() * rename constants * simplify preserved Running machine switch from preserve=now to preserve=when-failed * update tests

gardener-prow · 2026-01-19T07:04:22Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign unmarshall for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pkg/controller/machineset.go

takoverflow

Still going through the PR, will be adding more comments later.

pkg/util/provider/machinecontroller/machine.go

pkg/util/provider/machinecontroller/machine_util.go

pkg/util/provider/machinecontroller/machine.go

pkg/util/provider/machineutils/utils.go

…e accessing maps.

pkg/util/provider/machineutils/utils.go

gagan16k

Have a few comments, PTAL

pkg/controller/machineset.go

pkg/util/provider/machinecontroller/machine.go

pkg/controller/machineset.go

pkg/util/provider/machineutils/utils.go

pkg/util/provider/machinecontroller/machine_util.go

pkg/util/provider/machinecontroller/machine.go

pkg/util/provider/machinecontroller/machine_util.go

- Modify sort function to de-prioritize preserve machines - Add test for the same - Improve logging - Fix bug in stopMachinePreservationIfPreserved when node is not found - Update default MachinePreserveTimeout to 3 days as per doc

pkg/controller/controller_utils.go

pkg/controller/machineset.go

pkg/util/provider/machinecontroller/machine.go

pkg/util/provider/machinecontroller/machine_util.go

- Reuse function to write annotation on machine - Minor refactoring

thiyyakat · 2026-01-23T09:29:28Z

Manual Testing carried out with MCM-P-virtual

The following tests were carried out serially

Annotating node object

Annotating with "preserve=now"

annotation present on node
CA scale down disabled annotation present on node
Node Condition (type Preserved) added
- Reason set to "Preserved by User"
- Status set to "True"
annotation synced to machine
PreserveExpiryTime set in machine.CurrentStatus
Node object:

apiVersion: v1
kind: Node
metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"
    node.machine.sapcloud.io/preserve: now
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
...
 conditions:
	...
	  - lastHeartbeatTime: null
    lastTransitionTime: "2026-01-23T06:07:17Z"
    reason: PreservedByUser
    status: "True"
    type: Preserved

Machine object:

apiVersion: machine.sapcloud.io/v1alpha1
kind: Machine
metadata:
  annotations:
    machinepriority.machine.sapcloud.io: "3"
    node.machine.sapcloud.io/preserve: now
...
 currentStatus:
    lastUpdateTime: "2026-01-23T06:07:17Z"
    phase: Running
    preserveExpiryTime: "2026-01-23T08:07:17Z"

Annotating with "preserve=false"

annotation present on node
CA scale down disabled annotation removed from node
Node Condition (type Preserved) present
- Reason set to "PreservationStopped"
- Status set to "False"
annotation synced to machine
PreserveExpiryTime set to nil in machine.CurrentStatus
Node object:

apiVersion: v1
  kind: Node
  metadata:
    annotations:
      node.machine.sapcloud.io/preserve: "false"
...
conditions:
 - lastHeartbeatTime: null
      lastTransitionTime: "2026-01-23T06:14:11Z"
      reason: PreservationStopped
      status: "False"
      type: Preserved

Machine object:

apiVersion: machine.sapcloud.io/v1alpha1
kind: Machine
metadata:
  annotations:
    machinepriority.machine.sapcloud.io: "3"
    node.machine.sapcloud.io/preserve: "false"
...
 currentStatus:
    lastUpdateTime: "2026-01-23T06:14:11Z"
    phase: Running

Annotated "preserve=when-failed"

When machine is Running

annotation present on node
CA scale down disabled annotation present on node
Node Condition no change
annotation synced to machine
PreserveExpiryTime not set
Node object:

apiVersion: v1
kind: Node
metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"
    node.machine.sapcloud.io/preserve: when-failed
...
conditions://(remains unchanged after setting to false)
 - lastHeartbeatTime: null
    lastTransitionTime: "2026-01-23T06:14:11Z"
    reason: PreservationStopped
    status: "False"
    type: Preserved

Machine object:

apiVersion: machine.sapcloud.io/v1alpha1
kind: Machine
metadata:
  annotations:
    machinepriority.machine.sapcloud.io: "3"
    node.machine.sapcloud.io/preserve: when-failed
...
 currentStatus: // no change
    lastUpdateTime: "2026-01-23T06:14:11Z"
    phase: Running

When machine has `Failed`

Node Condition (type Preserved) changed
- Reason set to "Preserved by User"
- Status set to "True"
- Message set to "Preserved node drained successfully"
annotation synced to machine
PreserveExpiryTime set in machine.CurrentStatus
Node object:

spec:
  providerID: aws:///eu-west-1/i-8a1a5bc23ca0f7c53
  unschedulable: true
...
	conditions:
	  - lastHeartbeatTime: null
	    lastTransitionTime: "2026-01-23T06:48:58Z"
	    message: Preserved node drained successfully
	    reason: PreservedByUser
	    status: "True"
	    type: Preserved

Machine object:

 currentStatus:
    lastUpdateTime: "2026-01-23T06:48:58Z"
    phase: Failed
    preserveExpiryTime: "2026-01-23T08:48:58Z"

When machine & node with "preserved=when-failed" recovers to `Running`

annotation still present on node and machine
CA scale down disabled annotation still present on node
Node Condition (type Preserved) present
- Reason set to "PreservationStopped"
- Status set to "False"
PreserveExpiryTime set to nil machine.CurrentStatus
Node no longer marked "unschedulable"
Node object:

apiVersion: v1
kind: Node
metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"
    node.machine.sapcloud.io/preserve: when-failed
// when node was not ready, spec/unschedulable was set to true
...
conditions:
 - lastHeartbeatTime: null
    lastTransitionTime: "2026-01-23T06:54:59Z"
    reason: PreservationStopped
    status: "False"
    type: Preserved

Machine object:

apiVersion: machine.sapcloud.io/v1alpha1
kind: Machine
metadata:
  annotations:
    machinepriority.machine.sapcloud.io: "3"
    node.machine.sapcloud.io/preserve: when-failed
...
 currentStatus:
    lastUpdateTime: "2026-01-23T06:54:59Z"
    phase: Running
  lastOperation:
    description: Machine shoot--i749592--test-worker-test-z1-56676-7lznl successfully
      re-joined the cluster

Annotating machine object

Annotating with "preserve=when-failed"

Annotation present on machine
Annotation NOT synced to node
CA scale down disabled annotation added on node
No Node Condition of type=Preserved added on node
No PreserveExpiryTime set
Machine object

apiVersion: machine.sapcloud.io/v1alpha1
kind: Machine
metadata:
  annotations:
    machinepriority.machine.sapcloud.io: "3"
    node.machine.sapcloud.io/preserve: when-failed
...
 currentStatus:
    lastUpdateTime: "2026-01-23T06:58:45Z"
    phase: Running

Node object:

apiVersion: v1
kind: Node
metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2026-01-23T06:58:35Z"

When machine moved to `Failed`

Annotation unchanged on machine
Node still has no preserve annotation
PreserveExpiryTime must be set
CA scale down disabled annotation still present on node
Node condition (type=Preserved) set on node:
- Reason set to "Preserved by User"
- Status set to "True"
- Message set to "Preserved node drained successfully"
Node marked unschedulable
Machine object:

apiVersion: machine.sapcloud.io/v1alpha1
kind: Machine
metadata:
  annotations:
    machinepriority.machine.sapcloud.io: "3"
    node.machine.sapcloud.io/preserve: when-failed
...
	 currentStatus:
	    lastUpdateTime: "2026-01-23T07:10:18Z"
	    phase: Failed
	    preserveExpiryTime: "2026-01-23T09:10:18Z"

Node object:

apiVersion: v1
kind: Node
metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"
    volumes.kubernetes.io/controller-managed-attach-detach: "true"
  creationTimestamp: "2026-01-23T06:58:35Z"
...
spec:
	 unschedulable: true
...
	conditions:
	 - lastHeartbeatTime: null
	    lastTransitionTime: "2026-01-23T07:10:18Z"
	    message: Preserved node drained successfully
	    reason: PreservedByUser
	    status: "True"
	    type: Preserved

When machine recovers to `Running`

Machine annotation remains unchanged
PreserveExpiryTime cleared
Node Conditions (type=Preserved) changed:
- Reason set to "PreservationStopped"
- Status set to "False"
Node no longer unschedulable
CA scale-down annotation still present on node
Machine object:

apiVersion: machine.sapcloud.io/v1alpha1
kind: Machine
metadata:
  annotations:
    machinepriority.machine.sapcloud.io: "3"
    node.machine.sapcloud.io/preserve: when-failed
...
	currentStatus:
	    lastUpdateTime: "2026-01-23T07:29:18Z"
	    phase: Running
	  lastOperation:
	    description: Machine shoot--i749592--test-worker-test-z1-56676-ndqfp successfully
	      re-joined the cluster

Node object:

apiVersion: v1
kind: Node
metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"
...
//spec.unschedulable no longer false
	conditions:
	  - lastHeartbeatTime: null
	    lastTransitionTime: "2026-01-23T07:29:18Z"
	    reason: PreservationStopped
	    status: "False"
	    type: Preserved

Scaling down machineset with preserved machines and unpreserved machines

shoot--i749592--test-worker-test-z1-56676-9wt9c --> annotated with "preserve=now"
shoot--i749592--test-worker-test-z1-56676-ndqfp--> annotated with "preserve=when-failed" and node marked Not Ready
shoot--i749592--test-worker-test-z1-56676-xsrdj --> unpreserved

❯ k get mc
NAME                                              STATUS    AGE    NODE
shoot--i749592--test-worker-test-z1-56676-9wt9c   Running   111s   shoot--i749592--test-worker-test-z1-56676-9wt9c
shoot--i749592--test-worker-test-z1-56676-ndqfp   Failed    107m   shoot--i749592--test-worker-test-z1-56676-ndqfp
shoot--i749592--test-worker-test-z1-56676-xsrdj   Running   111s   shoot--i749592--test-worker-test-z1-56676-xsrdj
❯ k scale mcd shoot--i749592--test-worker-test-z1 --replicas=2
machinedeployment.machine.sapcloud.io/shoot--i749592--test-worker-test-z1 scaled
❯ k get mc
NAME                                              STATUS        AGE    NODE
shoot--i749592--test-worker-test-z1-56676-9wt9c   Running       2m1s   shoot--i749592--test-worker-test-z1-56676-9wt9c
shoot--i749592--test-worker-test-z1-56676-ndqfp   Failed        107m   shoot--i749592--test-worker-test-z1-56676-ndqfp
shoot--i749592--test-worker-test-z1-56676-xsrdj   Terminating   2m1s   shoot--i749592--test-worker-test-z1-56676-xsrdj
❯ k scale mcd shoot--i749592--test-worker-test-z1 --replicas=1
machinedeployment.machine.sapcloud.io/shoot--i749592--test-worker-test-z1 scaled
❯ k get mc
NAME                                              STATUS        AGE     NODE
shoot--i749592--test-worker-test-z1-56676-9wt9c   Running       2m12s   shoot--i749592--test-worker-test-z1-56676-9wt9c
shoot--i749592--test-worker-test-z1-56676-ndqfp   Terminating   107m    shoot--i749592--test-worker-test-z1-56676-ndqfp
shoot--i749592--test-worker-test-z1-56676-xsrdj   Terminating   2m12s   shoot--i749592--test-worker-test-z1-56676-xsrdj

The first machine to be scaled down was the unpreserved Running machine. Next, the preserved Failed machine and lastly the preserved Running machine.

When machine moves to `Failed` and `AutoPreserveFailedMachineMax=1`

Machine should be annotated with "preserve=auto-preserved"
PreserveExpiryTime should be set on machine
Node condition (type=Preserved) should be set on the node
- Status: true
- Message: Preserved node drained successfully
- Reason: PreservedByMCM
CA scale-down disabled annotation should be set on node
Machine object:

apiVersion: machine.sapcloud.io/v1alpha1
kind: Machine
metadata:
  annotations:
    machinepriority.machine.sapcloud.io: "3"
    node.machine.sapcloud.io/preserve: auto-preserved
...
  currentStatus:
    lastUpdateTime: "2026-01-23T09:22:53Z"
    phase: Failed
    preserveExpiryTime: "2026-01-26T09:22:53Z"

Node object:

apiVersion: v1
kind: Node
metadata:
  annotations:
    cluster-autoscaler.kubernetes.io/scale-down-disabled: "true"
spec:
  unschedulable: true
...
conditions:
 - lastHeartbeatTime: null
    lastTransitionTime: "2026-01-23T09:22:53Z"
    message: Preserved node drained successfully
    reason: PreservedByMCM
    status: "True"
    type: Preserved

- Make changes to add auto-preserve-stopped on recovered, auto-preserved previously failed machines. - Change stopMachinePreservationIfPreserved to removeCA annotation when preserve=false on a recovered failed, preserved machine

gardener-robot added kind/api-change API change with impact on API users needs/second-opinion Needs second review by someone else needs/rebase Needs git rebase labels Dec 10, 2025

gardener-robot added needs/review Needs review size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 10, 2025

thiyyakat force-pushed the feat/preserve-machine branch 2 times, most recently from 06ecf58 to 89f2900 Compare December 10, 2025 12:06

thiyyakat commented Dec 11, 2025

View reviewed changes

takoverflow reviewed Dec 12, 2025

View reviewed changes

machine-controller-manager Outdated Show resolved Hide resolved

takoverflow requested changes Dec 18, 2025

View reviewed changes

gardener-robot added the needs/changes Needs (more) changes label Dec 18, 2025

thiyyakat force-pushed the feat/preserve-machine branch from 22c646e to 7c062b5 Compare December 19, 2025 08:30

thiyyakat added 16 commits December 31, 2025 14:56

[WIP] Add support for machine preservation through annotations

b76c117

# Conflicts: # pkg/util/provider/machinecontroller/machine.go # pkg/util/provider/machinecontroller/machine_util.go

Add MachinePreserveTimeout to SafetyOptions.

b814d93

Add PreserveExpiryTime to machine.Status.CurrentStatus.

f958786

Remove AutoPreserveFailedMachineCount from machine set

86502b3

Fix linting error

b92cfb2

Add generated files

6dce992

Add support for preserve=now on node and machine objects

1469138

# Conflicts: # pkg/util/provider/machinecontroller/machine.go

Update TODOs

2bbb7e7

[WIP] Implement add/remove/update of node and machine annotations

59edd3b

Update preserve logic to honour node annotations over machine

7792fa3

Add preservation logic in machineset controller. TODO: remove debug logs

5796f51

Add drain logic post preservation of failed machine

1636ace

Fix return for reconcileMachineHealth. Unit tests passing

ee1afa2

Update CRDs

bb515d9

Fix bug causing repeated requeuing

f462f8f

elankath reviewed Jan 14, 2026

View reviewed changes

pkg/util/provider/machineutils/utils.go Outdated Show resolved Hide resolved

takoverflow reviewed Jan 14, 2026

View reviewed changes

Makefile Outdated Show resolved Hide resolved

pkg/controller/machineset.go Outdated Show resolved Hide resolved

thiyyakat added 5 commits January 16, 2026 09:55

Modify nodeops.AddOrUpdateConditionsOnNode() to return updated node

50309c8

Address review comments - part 8:

c0bba76

* remove auto preservation logic from manageReplicas() * rename constants * simplify preserved Running machine switch from preserve=now to preserve=when-failed * update tests

Handle auto-preserved case similar to when-failed case

c213beb

Fix bugs, incorporate design change for when-failed, and add tests

2d60d62

Revert Makefile changes

63008aa

Add preservation tests for machineSet controller

0c9890f

thiyyakat force-pushed the feat/preserve-machine branch from 4c9f267 to 0c9890f Compare January 19, 2026 08:41

Update comments and fix minor bugs

d5ef47b

r4mek reviewed Jan 20, 2026

View reviewed changes

pkg/controller/machineset.go Outdated Show resolved Hide resolved

takoverflow requested changes Jan 20, 2026

View reviewed changes

r4mek reviewed Jan 20, 2026

View reviewed changes

pkg/util/provider/machinecontroller/machine.go Outdated Show resolved Hide resolved

r4mek reviewed Jan 20, 2026

View reviewed changes

pkg/util/provider/machinecontroller/machine.go Show resolved Hide resolved

r4mek reviewed Jan 20, 2026

View reviewed changes

pkg/util/provider/machinecontroller/machine.go Outdated Show resolved Hide resolved

Address review comments - part 9

ce39437

r4mek reviewed Jan 20, 2026

View reviewed changes

pkg/util/provider/machineutils/utils.go Outdated Show resolved Hide resolved

r4mek reviewed Jan 20, 2026

View reviewed changes

pkg/util/provider/machineutils/utils.go Outdated Show resolved Hide resolved

r4mek reviewed Jan 20, 2026

View reviewed changes

pkg/util/provider/machineutils/utils.go Show resolved Hide resolved

r4mek reviewed Jan 21, 2026

View reviewed changes

pkg/util/provider/machineutils/utils.go Outdated Show resolved Hide resolved

Address review comments - part 10: Remove unnecessary nil checks whil…

e8fff75

…e accessing maps.

r4mek reviewed Jan 21, 2026

View reviewed changes

pkg/util/provider/machineutils/utils.go Outdated Show resolved Hide resolved

Add machine-preserve-timeout flag

9c082f1

gagan16k reviewed Jan 21, 2026

View reviewed changes

Address review comments - part 11:

fb9fcbc

- Modify sort function to de-prioritize preserve machines - Add test for the same - Improve logging - Fix bug in stopMachinePreservationIfPreserved when node is not found - Update default MachinePreserveTimeout to 3 days as per doc

takoverflow requested changes Jan 22, 2026

View reviewed changes

Address review comments - part 12:

22a3e79

- Reuse function to write annotation on machine - Minor refactoring

Handle edge cases:

c3ed4eb

- Make changes to add auto-preserve-stopped on recovered, auto-preserved previously failed machines. - Change stopMachinePreservationIfPreserved to removeCA annotation when preserve=false on a recovered failed, preserved machine

Add support for Preservation of Machines and Backing nodes #1059

Are you sure you want to change the base?

Add support for Preservation of Machines and Backing nodes #1059

Uh oh!

Conversation

thiyyakat commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gardener-robot commented Dec 10, 2025

Uh oh!

thiyyakat commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thiyyakat left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

takoverflow left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

takoverflow Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thiyyakat Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

takoverflow left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gardener-prow bot commented Jan 19, 2026

Uh oh!

Uh oh!

takoverflow left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gagan16k left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thiyyakat commented Dec 10, 2025 •

edited

Loading

thiyyakat commented Dec 11, 2025 •

edited

Loading

thiyyakat left a comment •

edited

Loading

takoverflow Dec 18, 2025 •

edited

Loading

thiyyakat Dec 23, 2025 •

edited

Loading

When machine has `Failed`

When machine & node with "preserved=when-failed" recovers to `Running`

When machine moved to `Failed`

When machine recovers to `Running`

When machine moves to `Failed` and `AutoPreserveFailedMachineMax=1`