OCPBUGS-69447: feat(updates): enable CVO metrics access with RHOBS monitoring flag#7399
Conversation
|
@Chee-Lu: This pull request references OCM-10395 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Important Review skippedAuto reviews are limited based on label configuration. 🚫 Review skipped — only excluded labels are configured. (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
WalkthroughAdds RHOBS monitoring gating, ROSA HCP detection, and RHOBS Prometheus URL propagation: introduces Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
|
@Chee-Lu: This pull request references OCM-10395 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@Chee-Lu: This pull request references OCM-10395 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/auto-cc |
|
Hi @Chee-Lu. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@Chee-Lu: This pull request references OCM-10395 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@Chee-Lu: This pull request references OCM-10395 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go (1)
102-119: Consider extracting duplicate enableMetricsAccess check.The
enableMetricsAccessvariable is computed identically on lines 33 and 103. Consider computing it once at the function start and reusing it to reduce duplication.The conditional metrics URL configuration looks correct, routing to the appropriate monitoring endpoint (RHOBS Prometheus vs CoreOS Thanos) with the correct protocols.
Apply this diff to eliminate the duplicate check:
func (cvo *clusterVersionOperator) adaptDeployment(cpContext component.WorkloadContext, deployment *appsv1.Deployment) error { + // Enable CVO metrics access if either RHOBS monitoring is enabled or the explicit flag is set + enableMetricsAccess := os.Getenv(rhobsmonitoring.EnvironmentVariable) == "1" || cvo.enableCVOManagementClusterMetricsAccess + - // Enable CVO metrics access if either RHOBS monitoring is enabled or the explicit flag is set - enableMetricsAccess := os.Getenv(rhobsmonitoring.EnvironmentVariable) == "1" || cvo.enableCVOManagementClusterMetricsAccess - if enableMetricsAccess { if deployment.Spec.Template.Labels == nil { deployment.Spec.Template.Labels = map[string]string{} } deployment.Spec.Template.Labels[config.NeedMetricsServerAccessLabel] = "true" deployment.Spec.Template.Spec.ServiceAccountName = ComponentName } featureSet := configv1.Default if cpContext.HCP.Spec.Configuration != nil && cpContext.HCP.Spec.Configuration.FeatureGate != nil { featureSet = cpContext.HCP.Spec.Configuration.FeatureGate.FeatureSet } // ... (rest of the function) util.UpdateContainer(ComponentName, deployment.Spec.Template.Spec.Containers, func(c *corev1.Container) { util.UpsertEnvVar(c, corev1.EnvVar{ Name: "RELEASE_IMAGE", Value: dataPlaneReleaseImage, }) if updateService := cpContext.HCP.Spec.UpdateService; updateService != "" { c.Args = append(c.Args, "--update-service", string(updateService)) } - // Enable CVO metrics access if either RHOBS monitoring is enabled or the explicit flag is set - enableMetricsAccess := os.Getenv(rhobsmonitoring.EnvironmentVariable) == "1" || cvo.enableCVOManagementClusterMetricsAccess - if enableMetricsAccess { c.Args = append(c.Args, "--use-dns-for-services=true") // Configure metrics endpoint based on monitoring stack var metricsURL string if os.Getenv(rhobsmonitoring.EnvironmentVariable) == "1" { // RHOBS Prometheus uses HTTP without TLS metricsURL = fmt.Sprintf("http://hypershift-monitoring-stack-prometheus.openshift-observability-operator.svc:9090?namespace=%s", cpContext.HCP.Namespace) } else { // CoreOS Thanos uses HTTPS with service CA c.Args = append(c.Args, "--metrics-ca-bundle-file=/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt") metricsURL = fmt.Sprintf("https://thanos-querier.openshift-monitoring.svc:9092?namespace=%s", cpContext.HCP.Namespace) } c.Args = append(c.Args, fmt.Sprintf("--metrics-url=%s", metricsURL)) } }) return nil }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (3)
cmd/install/install.go(1 hunks)control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go(4 hunks)hypershift-operator/controllers/hostedcluster/network_policies.go(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
cmd/install/install.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.gohypershift-operator/controllers/hostedcluster/network_policies.go
🔇 Additional comments (4)
cmd/install/install.go (1)
272-272: LGTM: Documentation clarifies RHOBS monitoring behavior.The updated flag description accurately reflects the additional CVO metrics access capability enabled when
--rhobs-monitoringis set.control-plane-operator/controllers/hostedcontrolplane/v2/cvo/deployment.go (1)
32-41: LGTM: Metrics access properly gated.The logic correctly enables metrics access when either RHOBS monitoring is active (via environment variable) or the explicit flag is set. The label and service account configuration support the network policy egress rules.
hypershift-operator/controllers/hostedcluster/network_policies.go (2)
88-97: LGTM: Metrics access network policy properly gated.The logic correctly creates the metrics server network policy when either RHOBS monitoring is active or the explicit flag is set, consistent with the CVO deployment configuration.
836-910: LGTM: Network policy properly routes to monitoring stack.The refactored function correctly creates egress rules based on the monitoring stack:
- RHOBS: targets OBO Prometheus (port 9090)
- CoreOS: targets Thanos Querier (port 9092)
This aligns with the CVO deployment changes that configure the corresponding metrics URLs. The pod and namespace selectors appropriately identify the monitoring components.
|
@Chee-Lu: This pull request references Jira Issue OCPBUGS-69447, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
/verified by @Chee-Lu |
|
@celebdor: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/ok-to-test |
|
/retest-required |
|
/hold Revision 7932939 was retested 3 times: holding |
|
/retest-required |
|
/test e2e-aws-4-21 |
|
@Chee-Lu: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@Chee-Lu: Jira Issue Verification Checks: Jira Issue OCPBUGS-69447 Jira Issue OCPBUGS-69447 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@celebdor: new pull request created: #7632 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/cherry-pick release-4.20 |
|
@gaol: new pull request created: #7660 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Fix included in accepted release 4.22.0-0.nightly-2026-02-08-124411 |
What this PR does:
When
--rhobs-monitoring=trueis set (for ROSA HCP), enable CVO access to OBO Prometheus for conditional update risk evaluation.Aldo add
--cvo-prometheus-urlflag to allow overriding the default Prometheus endpoint. This provides flexibility for future changes (e.g., if ROSA changes the service name) or for platforms with different monitoring architectures (e.g., ARO HCP's self-managed Prometheus). When not specified, platform-appropriate defaults are used.The CVO deployment logic routes to different metrics endpoints based on the monitoring stack:
http://hypershift-monitoring-stack-prometheus.openshift-observability-operator.svc:9090https://thanos-querier.openshift-monitoring.svc:9092--cvo-prometheus-urlparameterKey changes:
--rhobs-monitoring(for ROSA HCP) or--enable-cvo-management-cluster-metrics-access(for self-managed HyperShift on OpenShift or ARO HCP) is set--cvo-prometheus-urlflag to configure CVO Prometheus endpointWhich issue(s) this PR fixes:
fixes https://issues.redhat.com//browse/OCM-10395
fixes https://issues.redhat.com//browse/OCM-20970
Special notes for your reviewer:
Backport Requirements
This change should be backported to 4.17.z, 4.18.z, 4.19.z, 4.20.z and 4.21.z to benefit customers on that version.