Skip to content

Enable kubernetes_node_scale benchmark (up to 5k nodes) on AWS EKS with Karpenter#6512

Draft
kiryl-filatau wants to merge 19 commits intoGoogleCloudPlatform:masterfrom
kiryl-filatau:aws-5k
Draft

Enable kubernetes_node_scale benchmark (up to 5k nodes) on AWS EKS with Karpenter#6512
kiryl-filatau wants to merge 19 commits intoGoogleCloudPlatform:masterfrom
kiryl-filatau:aws-5k

Conversation

@kiryl-filatau
Copy link
Collaborator

@kiryl-filatau kiryl-filatau commented Mar 4, 2026

Summary

Enables running the kubernetes_node_scale benchmark (0→5k→0→5k nodes) on AWS EKS with Karpenter. The benchmark scales a deployment with pod anti-affinity, measures scale-up/scale-down and a second scale-up, then tears down the cluster.

Main changes

  • Kubernetes_node_scale benchmark: Template and scaling logic (scale up, scale down, phases), metrics collection, and timeouts tuned for large runs.
  • EKS + Karpenter: Nodepool template (instance types including t, higher CPU limit), EKS/Karpenter cluster lifecycle and cleanup.
  • Teardown robustness: Orphan ENI deletion in _CleanupKarpenter — retry with backoff on AWS throttle (RequestLimitExceeded), treat “ENI not found” as success; uses suppress_failure for these cases.
  • Tracker: Single get nodes pass in _StopWatchingForNodeChanges; resolve machine type only for current nodes, use "unknown" for others to avoid thousands of kubectl calls on 5k-node runs.

NOTE: Hardcoded values to be updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants