Skip to content

NO-JIRA: fix: report all degraded controller conditions instead of first only#444#444

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
RadekManak:feature-ccm-flake-fix
Apr 10, 2026
Merged

NO-JIRA: fix: report all degraded controller conditions instead of first only#444#444
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
RadekManak:feature-ccm-flake-fix

Conversation

@RadekManak
Copy link
Copy Markdown
Contributor

@RadekManak RadekManak commented Apr 10, 2026

Summary

  • checkControllerConditions() was iterating the CO status conditions and returning early on the first degraded condition found, so when both CloudConfigController and TrustedCABundleController are degraded simultaneously, only one gets reflected in the operator's Degraded status message — whichever happens to appear first in the slice.
  • This caused a flake in the e2e test for OCP-70566 (openshift/openshift-tests-private#29732) where the test expected TrustedCABundleControllerControllerDegraded in the message but got CloudConfigControllerDegraded instead.

Fix

Collect all degraded conditions in a single pass and return a combined error message, so all failing controllers are always reported.

Test plan

  • Re-run [sig-cluster-lifecycle] Cluster_Infrastructure CCM Medium-70566-Garbage in cloud-controller-manager status — the Degraded message will now contain both conditions when both controllers fail, making the test deterministic

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 10, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 12af340c-50a0-4760-81d2-dea03bc469d7

📥 Commits

Reviewing files that changed from the base of the PR and between 07c949d and 099bb22.

📒 Files selected for processing (1)
  • pkg/controllers/clusteroperator_controller.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/controllers/clusteroperator_controller.go

Walkthrough

Modified checkControllerConditions in the cluster operator controller to collect degraded condition messages for both cloudConfigControllerDegradedCondition and trustedCABundleControllerDegradedCondition, sort them deterministically, then return a single aggregated error instead of returning on the first degraded condition. Added sort and strings imports.

Changes

Cohort / File(s) Summary
Cluster Operator Controller
pkg/controllers/clusteroperator_controller.go
Changed checkControllerConditions to accumulate matching degraded condition messages into a slice, sort the messages, and return one aggregated error (joined with "; "). Updated imports to include sort and strings.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot requested review from nrb and racheljpg April 10, 2026 14:08
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/controllers/clusteroperator_controller.go (1)

324-326: Consider sorting degraded messages before join to make error text fully deterministic.

The current output order depends on co.Status.Conditions iteration order. Sorting before strings.Join makes exact message text stable across reconciles/tests.

Suggested diff
 import (
 	"context"
 	"fmt"
+	"sort"
 	"strings"
@@
 	if len(degradedMessages) > 0 {
+		sort.Strings(degradedMessages)
 		return false, fmt.Errorf("failed to apply resources because %s", strings.Join(degradedMessages, "; "))
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controllers/clusteroperator_controller.go` around lines 324 - 326, The
error string built from degradedMessages is non-deterministic because it
reflects iteration order; before calling strings.Join(degradedMessages, "; "),
sort the slice (e.g., use sort.Strings(degradedMessages)) so the output is
stable across runs, and add the necessary import for "sort"; update the block
that returns fmt.Errorf("failed to apply resources because %s",
strings.Join(degradedMessages, "; ")) to sort degradedMessages first.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/controllers/clusteroperator_controller.go`:
- Around line 324-326: The error string built from degradedMessages is
non-deterministic because it reflects iteration order; before calling
strings.Join(degradedMessages, "; "), sort the slice (e.g., use
sort.Strings(degradedMessages)) so the output is stable across runs, and add the
necessary import for "sort"; update the block that returns fmt.Errorf("failed to
apply resources because %s", strings.Join(degradedMessages, "; ")) to sort
degradedMessages first.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c22b36ca-2842-47a3-9960-a96b40ff44a2

📥 Commits

Reviewing files that changed from the base of the PR and between b4e6117 and 07c949d.

📒 Files selected for processing (1)
  • pkg/controllers/clusteroperator_controller.go

checkControllerConditions() was returning early on the first degraded
condition it encountered, so when multiple controllers were degraded
simultaneously only one would be reflected in the operator's Degraded
status message. Collect all degraded conditions and report them together.
@nrb
Copy link
Copy Markdown
Contributor

nrb commented Apr 10, 2026

I should probably integrate this in to #436

@RadekManak
Copy link
Copy Markdown
Contributor Author

feel free to do so.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 10, 2026

@RadekManak: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@nrb
Copy link
Copy Markdown
Contributor

nrb commented Apr 10, 2026

/approve
/lgtm
/verified by CI

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Apr 10, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@nrb: This PR has been marked as verified by CI.

Details

In response to this:

/approve
/lgtm
/verified by CI

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@nrb
Copy link
Copy Markdown
Contributor

nrb commented Apr 10, 2026

/retitle NO-JIRA: fix: report all degraded controller conditions instead of first only#444

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 10, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 10, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: nrb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot changed the title fix: report all degraded controller conditions instead of first only NO-JIRA: fix: report all degraded controller conditions instead of first only#444 Apr 10, 2026
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 10, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 10, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@RadekManak: This pull request explicitly references no jira issue.

Details

In response to this:

Summary

  • checkControllerConditions() was iterating the CO status conditions and returning early on the first degraded condition found, so when both CloudConfigController and TrustedCABundleController are degraded simultaneously, only one gets reflected in the operator's Degraded status message — whichever happens to appear first in the slice.
  • This caused a flake in the e2e test for OCP-70566 (openshift/openshift-tests-private#29732) where the test expected TrustedCABundleControllerControllerDegraded in the message but got CloudConfigControllerDegraded instead.

Fix

Collect all degraded conditions in a single pass and return a combined error message, so all failing controllers are always reported.

Test plan

  • Re-run [sig-cluster-lifecycle] Cluster_Infrastructure CCM Medium-70566-Garbage in cloud-controller-manager status — the Degraded message will now contain both conditions when both controllers fail, making the test deterministic

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-bot openshift-merge-bot bot merged commit 1eccfe3 into openshift:main Apr 10, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants