Skip to content

Conversation

@gagan16k
Copy link
Member

@gagan16k gagan16k commented Jan 21, 2026

What this PR does / why we need it:
This PR ensures that finalizers are not added to non-MCM nodes lacking the NotManagedByMCM annotation, as this can lead to testing issues in kubernetes environments.
This is done by verifying that an associated machine exists for the node before adding finalizers.

Changes:

  • Skip processing node updates and cluster node reconciliation if there is no associated machine
  • Replaced the check for a removed finalizer(hasNodeFinalizerBeenRemoved) with a direct check for the presence of the finalizer

Special notes for your reviewer:

IT logs
Random Seed: 1769002937

Will run 10 of 10 specs
------------------------------
[BeforeSuite]
/Users/I765230/go/src/github.com/gagan16k/machine-controller-manager-provider-aws/test/integration/controller/controller_test.go:47
  > Enter [BeforeSuite] TOP-LEVEL @ 01/21/26 19:12:27.07
  STEP: Checking for the clusters if provided are available @ 01/21/26 19:12:27.071
  2026/01/21 19:12:27 Control cluster kube-config - /Users/I765230/go/src/github.com/gagan16k/machine-controller-manager-provider-aws/dev/kube-configs/kubeconfig_control.yaml
  2026/01/21 19:12:27 Target cluster kube-config  - /Users/I765230/go/src/github.com/gagan16k/machine-controller-manager-provider-aws/dev/kube-configs/kubeconfig_target.yaml
  STEP: Killing any existing processes @ 01/21/26 19:12:29.937
  STEP: Checking Machine-Controller-Manager repo is available at: ../../../dev/mcm @ 01/21/26 19:12:30.212
  STEP: Scaledown existing machine controllers @ 01/21/26 19:12:30.212
  STEP: Starting Machine Controller  @ 01/21/26 19:12:30.411
  STEP: Starting Machine Controller Manager @ 01/21/26 19:12:30.424
  STEP: Cleaning any old resources @ 01/21/26 19:12:30.429
  2026/01/21 19:12:30 machinedeployments.machine.sapcloud.io "test-machine-deployment" not found
  2026/01/21 19:12:30 machines.machine.sapcloud.io "test-machine" not found
  2026/01/21 19:12:31 machineclasses.machine.sapcloud.io "test-mc-v1" not found
  2026/01/21 19:12:31 machineclasses.machine.sapcloud.io "test-mc-v2" not found
  STEP: Setup MachineClass @ 01/21/26 19:12:31.231
  STEP: Looking for machineclass resource in the control cluster @ 01/21/26 19:12:32.524
  STEP: Looking for secrets refered in machineclass in the control cluster @ 01/21/26 19:12:32.791
  STEP: Initializing orphan resource tracker @ 01/21/26 19:12:33.156
  2026/01/21 19:12:37 orphan resource tracker initialized
  < Exit [BeforeSuite] TOP-LEVEL @ 01/21/26 19:12:37.821 (10.751s)
[BeforeSuite] PASSED [10.751 seconds]
------------------------------
Machine controllers test machine resource creation should not lead to any errors and add 1 more node in target cluster
/Users/I765230/go/src/github.com/gagan16k/machine-controller-manager/pkg/test/integration/common/framework.go:649
  > Enter [BeforeEach] Machine controllers test @ 01/21/26 19:12:37.822
  STEP: Checking machineController process is running @ 01/21/26 19:12:37.822
  STEP: Checking machineControllerManager process is running @ 01/21/26 19:12:37.822
  STEP: Checking nodes in target cluster are healthy @ 01/21/26 19:12:37.822
  < Exit [BeforeEach] Machine controllers test @ 01/21/26 19:12:38.669 (848ms)
  > Enter [It] should not lead to any errors and add 1 more node in target cluster @ 01/21/26 19:12:38.67
  STEP: Checking for errors @ 01/21/26 19:12:38.88
  STEP: Waiting until number of ready nodes is 1 more than initial nodes @ 01/21/26 19:12:39.082
  < Exit [It] should not lead to any errors and add 1 more node in target cluster @ 01/21/26 19:14:24.677 (1m46.006s)
• [106.854 seconds]
------------------------------
Machine controllers test machine resource deletion when machines available should not lead to errors and remove 1 node in target cluster
/Users/I765230/go/src/github.com/gagan16k/machine-controller-manager/pkg/test/integration/common/framework.go:678
  > Enter [BeforeEach] Machine controllers test @ 01/21/26 19:14:24.677
  STEP: Checking machineController process is running @ 01/21/26 19:14:24.677
  STEP: Checking machineControllerManager process is running @ 01/21/26 19:14:24.677
  STEP: Checking nodes in target cluster are healthy @ 01/21/26 19:14:24.677
  < Exit [BeforeEach] Machine controllers test @ 01/21/26 19:14:25.2 (524ms)
  > Enter [It] should not lead to errors and remove 1 node in target cluster @ 01/21/26 19:14:25.2
  STEP: Checking for errors @ 01/21/26 19:14:26.344
  STEP: Waiting until test-machine machine object is deleted @ 01/21/26 19:14:26.565
  STEP: Waiting until number of ready nodes is equal to number of initial nodes @ 01/21/26 19:14:50.981
  < Exit [It] should not lead to errors and remove 1 node in target cluster @ 01/21/26 19:14:51.83 (26.629s)
• [27.153 seconds]
------------------------------
Machine controllers test machine resource deletion when machines are not available should keep nodes intact
/Users/I765230/go/src/github.com/gagan16k/machine-controller-manager/pkg/test/integration/common/framework.go:717
  > Enter [BeforeEach] Machine controllers test @ 01/21/26 19:14:51.83
  STEP: Checking machineController process is running @ 01/21/26 19:14:51.83
  STEP: Checking machineControllerManager process is running @ 01/21/26 19:14:51.83
  STEP: Checking nodes in target cluster are healthy @ 01/21/26 19:14:51.83
  < Exit [BeforeEach] Machine controllers test @ 01/21/26 19:14:52.263 (433ms)
  > Enter [It] should keep nodes intact @ 01/21/26 19:14:52.263
  STEP: Skipping as there are machines available and this check can't be performed @ 01/21/26 19:14:52.656
  < Exit [It] should keep nodes intact @ 01/21/26 19:14:52.656 (393ms)
• [0.827 seconds]
------------------------------
Machine controllers test machine deployment resource creation with replicas=0, scale up with replicas=1 should not lead to errors and add 1 more node to target cluster
/Users/I765230/go/src/github.com/gagan16k/machine-controller-manager/pkg/test/integration/common/framework.go:745
  > Enter [BeforeEach] Machine controllers test @ 01/21/26 19:14:52.656
  STEP: Checking machineController process is running @ 01/21/26 19:14:52.656
  STEP: Checking machineControllerManager process is running @ 01/21/26 19:14:52.657
  STEP: Checking nodes in target cluster are healthy @ 01/21/26 19:14:52.657
  < Exit [BeforeEach] Machine controllers test @ 01/21/26 19:14:53.31 (653ms)
  > Enter [It] should not lead to errors and add 1 more node to target cluster @ 01/21/26 19:14:53.31
  STEP: Checking for errors @ 01/21/26 19:14:53.524
  STEP: Waiting for Machine Set to be created @ 01/21/26 19:14:53.733
  STEP: Updating machineDeployment replicas to 1 @ 01/21/26 19:14:56.642
  STEP: Checking if machineDeployment's status has been updated with correct conditions @ 01/21/26 19:14:57.039
  STEP: Checking number of ready nodes==1 @ 01/21/26 19:16:56.8
  STEP: Fetching initial number of machineset freeze events @ 01/21/26 19:16:59.004
  < Exit [It] should not lead to errors and add 1 more node to target cluster @ 01/21/26 19:16:59.846 (2m6.537s)
• [127.190 seconds]
------------------------------
Machine controllers test machine deployment resource scale-up with replicas=6 should not lead to errors and add further 5 nodes to target cluster
/Users/I765230/go/src/github.com/gagan16k/machine-controller-manager/pkg/test/integration/common/framework.go:813
  > Enter [BeforeEach] Machine controllers test @ 01/21/26 19:16:59.846
  STEP: Checking machineController process is running @ 01/21/26 19:16:59.846
  STEP: Checking machineControllerManager process is running @ 01/21/26 19:16:59.847
  STEP: Checking nodes in target cluster are healthy @ 01/21/26 19:16:59.847
  < Exit [BeforeEach] Machine controllers test @ 01/21/26 19:17:01.137 (1.29s)
  > Enter [It] should not lead to errors and add further 5 nodes to target cluster @ 01/21/26 19:17:01.137
  STEP: Checking for errors @ 01/21/26 19:17:01.619
  STEP: Checking number of ready nodes are 6 more than initial @ 01/21/26 19:17:01.619
  < Exit [It] should not lead to errors and add further 5 nodes to target cluster @ 01/21/26 19:18:48.822 (1m47.685s)
• [108.976 seconds]
------------------------------
Machine controllers test machine deployment resource scale-down with replicas=2 should not lead to errors and remove 4 nodes from target cluster
/Users/I765230/go/src/github.com/gagan16k/machine-controller-manager/pkg/test/integration/common/framework.go:843
  > Enter [BeforeEach] Machine controllers test @ 01/21/26 19:18:48.823
  STEP: Checking machineController process is running @ 01/21/26 19:18:48.823
  STEP: Checking machineControllerManager process is running @ 01/21/26 19:18:48.823
  STEP: Checking nodes in target cluster are healthy @ 01/21/26 19:18:48.823
  < Exit [BeforeEach] Machine controllers test @ 01/21/26 19:18:49.331 (508ms)
  > Enter [It] should not lead to errors and remove 4 nodes from target cluster @ 01/21/26 19:18:49.331
  STEP: Checking for errors @ 01/21/26 19:18:50.364
  STEP: Checking number of ready nodes are 2 more than initial @ 01/21/26 19:18:50.364
  < Exit [It] should not lead to errors and remove 4 nodes from target cluster @ 01/21/26 19:19:24.175 (34.845s)
• [35.353 seconds]
------------------------------
Machine controllers test machine deployment resource scale-down with replicas=2 should freeze and unfreeze machineset temporarily
/Users/I765230/go/src/github.com/gagan16k/machine-controller-manager/pkg/test/integration/common/framework.go:872
  > Enter [BeforeEach] Machine controllers test @ 01/21/26 19:19:24.175
  STEP: Checking machineController process is running @ 01/21/26 19:19:24.176
  STEP: Checking machineControllerManager process is running @ 01/21/26 19:19:24.176
  STEP: Checking nodes in target cluster are healthy @ 01/21/26 19:19:24.176
  < Exit [BeforeEach] Machine controllers test @ 01/21/26 19:19:24.61 (435ms)
  > Enter [It] should freeze and unfreeze machineset temporarily @ 01/21/26 19:19:24.61
  < Exit [It] should freeze and unfreeze machineset temporarily @ 01/21/26 19:19:25.429 (819ms)
• [1.254 seconds]
------------------------------
Machine controllers test machine deployment resource updation to v2 machine-class and replicas=4 should upgrade machines and add more nodes to target
/Users/I765230/go/src/github.com/gagan16k/machine-controller-manager/pkg/test/integration/common/framework.go:881
  > Enter [BeforeEach] Machine controllers test @ 01/21/26 19:19:25.43
  STEP: Checking machineController process is running @ 01/21/26 19:19:25.43
  STEP: Checking machineControllerManager process is running @ 01/21/26 19:19:25.43
  STEP: Checking nodes in target cluster are healthy @ 01/21/26 19:19:25.43
  < Exit [BeforeEach] Machine controllers test @ 01/21/26 19:19:26.362 (933ms)
  > Enter [It] should upgrade machines and add more nodes to target @ 01/21/26 19:19:26.362
  STEP: Checking for errors @ 01/21/26 19:19:27.165
  STEP: UpdatedReplicas to be 4 @ 01/21/26 19:19:27.166
  STEP: AvailableReplicas to be 4 @ 01/21/26 19:19:36.156
  STEP: Number of ready nodes be 4 more @ 01/21/26 19:21:22.148
  < Exit [It] should upgrade machines and add more nodes to target @ 01/21/26 19:22:07.633 (2m41.271s)
• [162.203 seconds]
------------------------------
Machine controllers test machine deployment resource deletion When there are machine deployment(s) available in control cluster should not lead to errors and list only initial nodes
/Users/I765230/go/src/github.com/gagan16k/machine-controller-manager/pkg/test/integration/common/framework.go:935
  > Enter [BeforeEach] Machine controllers test @ 01/21/26 19:22:07.633
  STEP: Checking machineController process is running @ 01/21/26 19:22:07.633
  STEP: Checking machineControllerManager process is running @ 01/21/26 19:22:07.633
  STEP: Checking nodes in target cluster are healthy @ 01/21/26 19:22:07.633
  < Exit [BeforeEach] Machine controllers test @ 01/21/26 19:22:08.143 (510ms)
  > Enter [It] should not lead to errors and list only initial nodes @ 01/21/26 19:22:08.143
  STEP: Checking for errors @ 01/21/26 19:22:08.33
  STEP: Waiting until number of ready nodes is equal to number of initial  nodes @ 01/21/26 19:22:08.521
  < Exit [It] should not lead to errors and list only initial nodes @ 01/21/26 19:22:45.804 (37.661s)
• [38.171 seconds]
------------------------------
Machine controllers test orphaned resources when the hyperscaler resources are queried should have been deleted
/Users/I765230/go/src/github.com/gagan16k/machine-controller-manager/pkg/test/integration/common/framework.go:972
  > Enter [BeforeEach] Machine controllers test @ 01/21/26 19:22:45.804
  STEP: Checking machineController process is running @ 01/21/26 19:22:45.804
  STEP: Checking machineControllerManager process is running @ 01/21/26 19:22:45.804
  STEP: Checking nodes in target cluster are healthy @ 01/21/26 19:22:45.804
  < Exit [BeforeEach] Machine controllers test @ 01/21/26 19:22:46.261 (457ms)
  > Enter [It] should have been deleted @ 01/21/26 19:22:46.261
  STEP: Querying and comparing @ 01/21/26 19:22:46.261
The following resources are orphans ... trying to delete them
Virtual Machines: []
Volumes: []
NICs: [eni-0a12fd84c9e96dded]
MCM Machines []
   < Exit [It] should have been deleted @ 01/21/26 19:22:57.397 (11.137s)
• [11.593 seconds]
------------------------------
[AfterSuite]
/Users/I765230/go/src/github.com/gagan16k/machine-controller-manager-provider-aws/test/integration/controller/controller_test.go:49
  > Enter [AfterSuite] TOP-LEVEL @ 01/21/26 19:22:57.397
  STEP: Running Cleanup @ 01/21/26 19:22:57.398
  2026/01/21 19:23:17 machinedeployments.machine.sapcloud.io "test-machine-deployment" not found
  2026/01/21 19:23:17 machines.machine.sapcloud.io "test-machine" not found
  2026/01/21 19:23:18 deleting test-mc-v1 machineclass
  2026/01/21 19:23:18 machineclass deleted
  2026/01/21 19:23:18 deleting test-mc-v2 machineclass
  2026/01/21 19:23:19 machineclass deleted
  STEP: Killing any existing processes @ 01/21/26 19:23:19.464
  2026/01/21 19:23:19 controller_manager --control-kubeconfig=/Users/I765230/go/src/github.com/gagan16k/machine-controller-manager-provider-aws/dev/kube-configs/kubeconfig_control.yaml --target-kubeconfig=/Users/I765230/go/src/github.com/gagan16k/machine-controller-manager-provider-aws/dev/kube-configs/kubeconfig_target.yaml --namespace=shoot--i765230--demo --safety-up=2 --safety-down=1 --machine-safety-overshooting-period=300ms --leader-elect=false --v=3
  2026/01/21 19:23:19 stopMCM killed MCM process(es) with pid(s): [20258]
  2026/01/21 19:23:19 main --control-kubeconfig=/Users/I765230/go/src/github.com/gagan16k/machine-controller-manager-provider-aws/dev/kube-configs/kubeconfig_control.yaml --target-kubeconfig=/Users/I765230/go/src/github.com/gagan16k/machine-controller-manager-provider-aws/dev/kube-configs/kubeconfig_target.yaml --namespace=shoot--i765230--demo --machine-creation-timeout=20m --machine-drain-timeout=5m --machine-health-timeout=10m --machine-pv-detach-timeout=2m --machine-safety-apiserver-statuscheck-timeout=30s --machine-safety-apiserver-statuscheck-period=1m --machine-safety-orphan-vms-period=30m --leader-elect=false --v=3
  2026/01/21 19:23:19 stopMCM killed MCM process(es) with pid(s): [20259]
  STEP: Scale back the existing machine controllers @ 01/21/26 19:23:19.709
  < Exit [AfterSuite] TOP-LEVEL @ 01/21/26 19:23:20.458 (23.061s)
[AfterSuite] PASSED [23.061 seconds]
------------------------------

Ran 10 of 10 Specs in 653.388 seconds
SUCCESS! -- 10 Passed | 0 Failed | 0 Pending | 0 Skipped
PASS

Ginkgo ran 1 suite in 11m2.50861825s
Test Suite Passed
Integration tests completed successfully

Release note:

Machine controller no longer adds finalizers or reconciles nodes with no associated machine

@gardener-prow gardener-prow bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 21, 2026
@gagan16k gagan16k marked this pull request as ready for review January 21, 2026 14:00
@gagan16k gagan16k requested a review from a team as a code owner January 21, 2026 14:00
@gardener-prow gardener-prow bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 21, 2026
Copy link
Member

@aaronfern aaronfern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix!
some small comments, please address them

return nil
}

// Avoid handling nodes not managed by MCM.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate this comment similar to what you added in updateNode()?

Copy link
Member Author

@gagan16k gagan16k Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a more detailed comment

// Avoid handling nodes not managed by MCM.
if _, err := c.getMachineFromNode(node.Name); err != nil {
if errors.Is(err, errNoMachineMatch) {
klog.V(4).Infof("ClusterNode %q: No machine found matching node, skipping adding finalizers", key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an error log and not an info log

@aaronfern
Copy link
Member

Can you also edit the release note to say the following

Machine controller no longer adds finalizers or reconciles nodes with no associated machine

@aaronfern aaronfern added the kind/bug Bug label Jan 21, 2026
@gardener-prow gardener-prow bot removed the do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. label Jan 21, 2026
Copy link
Member

@takoverflow takoverflow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one clarification requested PTAL
Thanks for the PR, the changes seem fine to me.

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Jan 22, 2026
@gardener-prow
Copy link

gardener-prow bot commented Jan 22, 2026

LGTM label has been added.

DetailsGit tree hash: 00c1d0d91fbf2c6e3bcf8a749661da66239633ef

@gardener-prow gardener-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 22, 2026
@gardener-prow gardener-prow bot removed the lgtm Indicates that a PR is ready to be merged. label Jan 22, 2026
Copy link
Member

@aaronfern aaronfern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Jan 22, 2026
@gardener-prow
Copy link

gardener-prow bot commented Jan 22, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aaronfern, takoverflow

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [aaronfern,takoverflow]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gardener-prow
Copy link

gardener-prow bot commented Jan 22, 2026

LGTM label has been added.

DetailsGit tree hash: 87e80c3b2eb0a3bb76e683da413b7d1239a33874

@aaronfern aaronfern merged commit aeffe39 into gardener:master Jan 22, 2026
12 checks passed
aaronfern pushed a commit to aaronfern/machine-controller-manager that referenced this pull request Jan 22, 2026
aaronfern added a commit that referenced this pull request Jan 22, 2026
Co-authored-by: Gagan <gagan.surathkal@gmail.com>
@gagan16k gagan16k deleted the finalizer_fakenode branch January 22, 2026 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. kind/bug Bug lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants