Kubernetes Cluster Scale Fails – Worker Nodes Not Joining After Scale Operation (CloudStack 4.19.0.1) #12577

vishnuvs369 · 2026-02-02T08:33:41Z

vishnuvs369
Feb 2, 2026

problem

We attempted to scale the cluster from 8 to 9 worker nodes using the Scale Cluster option in the CloudStack UI.

The new worker VM was created successfully and is in Running state in CloudStack. However, Kubernetes does not register the new node.

After the scale operation:

CloudStack shows 9 worker nodes

kubectl get nodes continues to show only 8 worker nodes

UI status stucks on scaling.

Error observed in CloudStack Management Server logs:

ERROR ... Unexpected exception while executing ScaleKubernetesClusterCmd
at KubernetesClusterResourceModifierActionWorker.removeSshFirewallRule
at KubernetesClusterScaleWorker.scaleKubernetesClusterIsolatedNetworkRules

versions

Environment details:

CloudStack version: 4.19.0.1

Cluster type: CloudStack Kubernetes Service (CKS)

Initial cluster size:

1 Control Plane

8 Worker Nodes (working fine)

Scale target: 9 Worker Nodes

Global setting cloud.kubernetes.cluster.max.size was increased from 10 to 50 prior to scaling.

The steps to reproduce the bug

Steps to Reproduce

Deploy a Kubernetes cluster using CloudStack Kubernetes Service (CKS) on CloudStack 4.19.0.1 with the following configuration:

1 Control Plane node

8 Worker nodes

From the CloudStack UI, navigate to:

Kubernetes → Clusters → → Scale Cluster

Scale the cluster by increasing the worker node count from 8 to 9 and submit the scale operation.

Observe the following behavior:

The new worker VM is created successfully and shows Running state in CloudStack.

The scale task stucks on scaling.

kubectl get nodes still showing node count as 8

What to do about it?

is there any workaround to resolve this without restarting the cluster

kiranchavala · 2026-02-02T08:38:00Z

kiranchavala
Feb 2, 2026
Collaborator

@vishnuvs369 Could you please let us know CKS iso details or which version Kubernetes version you have used also let us if the cks cluster was HA enabled

0 replies

vishnuvs369 · 2026-02-02T08:42:12Z

vishnuvs369
Feb 2, 2026
Author

attaching screenshot of k8s iso image; also, HA is not enabled. having 1 control node and 8 worker nodes.

0 replies

kiranchavala · 2026-02-02T09:39:13Z

kiranchavala
Feb 2, 2026
Collaborator

@vishnuvs369 Thanks, could you please provide the entire management server.log

0 replies

vishnuvs369 · 2026-02-02T10:15:01Z

vishnuvs369
Feb 2, 2026
Author

Please find the attached entire management server log
management-server.zip

0 replies

kiranchavala · 2026-02-02T13:40:25Z

kiranchavala
Feb 2, 2026
Collaborator

@vishnuvs369 Not hitting the issue on 4.22, was able to scale the cks cluster successfully

global setting set to cloud.kubernetes.cluster.max.size to 10

Deploy a cks cluster with 8 worker nodes from the iso (https://download.cloudstack.org/cks/setup-v1.33.1-calico-x86_64.iso)
Scale cks cluster to size 9
Cks cluster scaled successfully

0 replies

vishnuvs369 · 2026-02-02T14:07:36Z

vishnuvs369
Feb 2, 2026
Author

@kiranchavala Thank you for the update.

We do have plans to upgrade to the latest CloudStack version in the future. However, at the moment, upgrading or restarting the cluster is not feasible in our environment.

In the meantime, could you please advise if there is any workaround or recommended approach to scale the CKS cluster on CloudStack 4.19.0.1 without requiring a cluster restart?

Your guidance would be greatly appreciated.

0 replies

Pearl1594 · 2026-02-02T18:30:44Z

Pearl1594
Feb 2, 2026
Collaborator

@vishnuvs369 For the give CKS cluster, does the cluster network's source NAT IP have any firewall rules in place? Based on the logs you've shared, it shows that there's an NPE when attempting to delete the firewall rules in place:

2026-02-02 06:23:16,026 INFO  [c.c.k.c.a.KubernetesClusterActionWorker] (API-Job-Executor-1:ctx-9ed67e35 job-2217 ctx-31d5271e) (logid:5f736f29) Provisioned node VM : prod-node-19c1d04d78b in to the Kubernetes cluster : prod
2026-02-02 06:23:16,045 ERROR [c.c.a.ApiAsyncJobDispatcher] (API-Job-Executor-1:ctx-9ed67e35 job-2217) (logid:5f736f29) Unexpected exception while executing org.apache.cloudstack.api.command.user.kubernetes.cluster.ScaleKubernetesClusterCmd
java.lang.NullPointerException
	at com.cloud.kubernetes.cluster.actionworkers.KubernetesClusterResourceModifierActionWorker.removeSshFirewallRule(KubernetesClusterResourceModifierActionWorker.java:520)

0 replies

kiranchavala · 2026-02-03T07:03:35Z

kiranchavala
Feb 3, 2026
Collaborator

@vishnuvs369 please check the firewall and portforwarding rules of the cks networks( source nat IP)

Also you stop the cks cluster and change the service offering if that solves your use case

0 replies

vishnuvs369 · 2026-02-03T07:20:42Z

vishnuvs369
Feb 3, 2026
Author

@kiranchavala @Pearl1594
Attaching the screenshots of firewall and portforwarding rules for the CKS cluster.
As it is a production cluster, we wont able to stop it now...is there any other workaround.

0 replies

Kubernetes Cluster Scale Fails – Worker Nodes Not Joining After Scale Operation (CloudStack 4.19.0.1) #12577

Uh oh!

vishnuvs369 Feb 2, 2026

problem

versions

The steps to reproduce the bug

What to do about it?

Replies: 9 comments

Uh oh!

Uh oh!

kiranchavala Feb 2, 2026 Collaborator

Uh oh!

vishnuvs369 Feb 2, 2026 Author

Uh oh!

kiranchavala Feb 2, 2026 Collaborator

Uh oh!

vishnuvs369 Feb 2, 2026 Author

Uh oh!

kiranchavala Feb 2, 2026 Collaborator

Uh oh!

vishnuvs369 Feb 2, 2026 Author

Uh oh!

Pearl1594 Feb 2, 2026 Collaborator

Uh oh!

kiranchavala Feb 3, 2026 Collaborator

Uh oh!

Uh oh!

vishnuvs369 Feb 3, 2026 Author

vishnuvs369
Feb 2, 2026

kiranchavala
Feb 2, 2026
Collaborator

vishnuvs369
Feb 2, 2026
Author

kiranchavala
Feb 2, 2026
Collaborator

vishnuvs369
Feb 2, 2026
Author

kiranchavala
Feb 2, 2026
Collaborator

vishnuvs369
Feb 2, 2026
Author

Pearl1594
Feb 2, 2026
Collaborator

kiranchavala
Feb 3, 2026
Collaborator

vishnuvs369
Feb 3, 2026
Author