Skip to content

[release-1.9] chore(ci): update value yamls and disable unused K8s plugins (#4397)#4412

Merged
openshift-merge-bot[bot] merged 17 commits intoredhat-developer:release-1.9from
zdrapela:1.9-del-excl-mark
Mar 25, 2026
Merged

[release-1.9] chore(ci): update value yamls and disable unused K8s plugins (#4397)#4412
openshift-merge-bot[bot] merged 17 commits intoredhat-developer:release-1.9from
zdrapela:1.9-del-excl-mark

Conversation

@zdrapela
Copy link
Copy Markdown
Member

@zdrapela zdrapela commented Mar 16, 2026

Description

Manual cherry-pick of relevant changes from #4397.

Incorporates #4433

Which issue(s) does this PR fix

PR acceptance criteria

Please make sure that the following steps are complete:

  • GitHub Actions are completed and successful
  • Unit Tests are updated and passing
  • E2E Tests are updated and passing
  • Documentation is updated if necessary (requirement for new features)
  • Add a screenshot if the change is UX/UI related

How to test changes / Special notes to the reviewer

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 16, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@zdrapela
Copy link
Copy Markdown
Member Author

/tets e2e-ocp-helm

@zdrapela
Copy link
Copy Markdown
Member Author

/test e2e-aks-helm-nightly
/test e2e-eks-helm-nightly
/test e2e-aks-operator-nightly
/test e2e-eks-operator-nightly

@zdrapela zdrapela changed the title chore(ci): update value yamls and disable unused K8s plugins #4397 chore(ci): update value yamls and disable unused K8s plugins [release-1.9] (#4397) Mar 16, 2026
@zdrapela zdrapela changed the title chore(ci): update value yamls and disable unused K8s plugins [release-1.9] (#4397) [release-1.9] chore(ci): update value yamls and disable unused K8s plugins (#4397) Mar 16, 2026
@zdrapela
Copy link
Copy Markdown
Member Author

/test e2e-ocp-helm-nightly

@zdrapela zdrapela force-pushed the 1.9-del-excl-mark branch from 0638acb to 5b5c60e Compare March 16, 2026 13:12
@github-actions
Copy link
Copy Markdown
Contributor

The container image build workflow finished with status: cancelled.

@zdrapela
Copy link
Copy Markdown
Member Author

/test e2e-aks-helm-nightly
/test e2e-eks-helm-nightly
/test e2e-aks-operator-nightly
/test e2e-eks-operator-nightly
/test e2e-ocp-helm-nightly

@github-actions
Copy link
Copy Markdown
Contributor

Image was built and published successfully. It is available at:

@zdrapela
Copy link
Copy Markdown
Member Author

/test e2e-aks-helm-nightly
/test e2e-eks-helm-nightly
/test e2e-aks-operator-nightly
/test e2e-eks-operator-nightly
/test e2e-ocp-helm-nightly
/test e2e-gke-helm-nightly

@github-actions
Copy link
Copy Markdown
Contributor

Image was built and published successfully. It is available at:

@zdrapela
Copy link
Copy Markdown
Member Author

/test e2e-ocp-operator-nightly
/test e2e-aks-operator-nightly
/test e2e-eks-operator-nightly

@zdrapela
Copy link
Copy Markdown
Member Author

/test e2e-gke-operator-nightly

@zdrapela zdrapela force-pushed the 1.9-del-excl-mark branch from 9e3209e to 15b196b Compare March 24, 2026 11:35
@zdrapela zdrapela marked this pull request as ready for review March 24, 2026 11:39
@zdrapela
Copy link
Copy Markdown
Member Author

/test e2e-eks-helm-nightly

@rhdh-qodo-merge
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🔒 Security concerns

Potential command injection / unsafe shell composition:
getPodLogsWithGrep builds a shell command string that includes namespace, deploySelector, and filterWords and pipes through grep. If any of these values can be influenced externally (e.g., environment variables in CI or test parameters), this pattern can allow shell metacharacter injection. At minimum, ensure all inputs are trusted/controlled in CI; ideally avoid shell pipelines and pass arguments via execFile without shell interpretation, or strictly escape/validate filterWords and namespace before interpolation.

⚡ Recommended focus areas for review

Fragile Command

The oc logs command now relies on resolving a deployment name via command substitution ($(oc get deploy ...)). This can break if the selector matches zero or multiple deployments, and it also makes failures harder to diagnose (empty substitution can lead to confusing oc logs errors). Validate that the selector always resolves exactly one deployment in all supported environments and consider explicit error handling/logging when the resolution is empty or ambiguous.

const deploySelector = getBackstageDeploySelector();
const tailNumber = 100;

// Resolve the deployment by its metadata labels, then fetch logs from it.
// This works for both Helm and Operator since both set app.kubernetes.io/name
// on the Deployment (with different values), even though pod labels differ.
const deployTarget = `$(oc get deploy -n ${namespace} -l ${deploySelector} -o name)`;
let grepCommand = `oc logs ${deployTarget} --tail=${tailNumber} -c backstage-backend -n ${namespace}`;
for (const word of filterWords) {
  grepCommand += ` | grep '${word}'`;
}
Error Handling

waitForDeploymentReady now resolves the pod selector by reading the deployment before entering the retry loop. If this call fails transiently (API hiccup, RBAC, deployment not yet created), the whole wait fails immediately rather than retrying. Consider moving selector resolution inside the retry loop or handling errors to preserve the intended “wait until ready” behavior. Also confirm logPodConditions(namespace, podSelector) never receives an empty/undefined selector.

const endTime = Date.now() + timeout;

const podSelector = await this.getDeploymentPodSelector(
  deploymentName,
  namespace,
);

while (Date.now() < endTime) {
  try {
    const response = await this.appsApi.readNamespacedDeployment(
      deploymentName,
      namespace,
    );
    const availableReplicas = response.body.status?.availableReplicas || 0;
    const readyReplicas = response.body.status?.readyReplicas || 0;
    const conditions = response.body.status?.conditions || [];

    console.log(`Available replicas: ${availableReplicas}`);
    console.log(
      "Deployment conditions:",
      JSON.stringify(conditions, null, 2),
    );

    // Check for pod failure states when expecting replicas > 0
    if (expectedReplicas > 0 && podSelector) {
      const podFailureReason = await this.checkPodFailureStates(
        namespace,
        podSelector,
      );
      if (podFailureReason) {
        console.error(
          `Pod failure detected: ${podFailureReason}. Logging events and pod logs...`,
        );
        await this.logDeploymentEvents(deploymentName, namespace);
        throw new Error(
          `Deployment ${deploymentName} failed to start: ${podFailureReason}`,
        );
      }
    }

    // Log pod conditions using the deployment's pod selector
    await this.logPodConditions(namespace, podSelector);
📚 Focus areas based on broader codebase context

Override Risk

These value files add global.dynamic.plugins entries with disabled: true to “disable” plugins, but if global.dynamic.plugins is merged (not replaced), this can result in duplicate plugin entries rather than overriding the existing enabled ones. Validate the merge/override behavior in the pipeline/Helm values processing and ensure disabling is applied deterministically (e.g., via a dedicated disabled list or by editing the original plugin entry). (Ref 7, Ref 4)

plugins:
  # Disable plugins to save disk space on GKE
  # These plugins correspond to tests that are ignored in showcase-rbac-k8s project
  # See e2e-tests/playwright.config.ts SHOWCASE_RBAC_K8S config

  # Tekton - not installed on K8s clusters (AKS/EKS/GKE)
  # Not in showcase-rbac-k8s testMatch
  - package: ./dynamic-plugins/dist/backstage-community-plugin-tekton
    disabled: true

  # Adoption Insights - large plugin (~300MB) causing disk space issues
  # Not in showcase-rbac-k8s testMatch
  - package: ./dynamic-plugins/dist/red-hat-developer-hub-backstage-plugin-adoption-insights
    disabled: true
  - package: ./dynamic-plugins/dist/red-hat-developer-hub-backstage-plugin-adoption-insights-backend-dynamic
    disabled: true
  - package: ./dynamic-plugins/dist/red-hat-developer-hub-backstage-plugin-analytics-module-adoption-insights-dynamic
    disabled: true

  # Orchestrator - large OCI plugins causing disk space issues
  # Tests ignored on showcase-rbac-k8s: e2e-tests/playwright/e2e/plugins/orchestrator/*.spec.ts
  - package: 'oci://registry.access.redhat.com/rhdh/red-hat-developer-hub-backstage-plugin-orchestrator:{{ "{{" }}inherit{{ "}}" }}'
    disabled: true
  - package: 'oci://registry.access.redhat.com/rhdh/red-hat-developer-hub-backstage-plugin-orchestrator-backend:{{ "{{" }}inherit{{ "}}" }}'
    disabled: true
  - package: 'oci://registry.access.redhat.com/rhdh/red-hat-developer-hub-backstage-plugin-scaffolder-backend-module-orchestrator:{{ "{{" }}inherit{{ "}}" }}'
    disabled: true
  - package: 'oci://registry.access.redhat.com/rhdh/red-hat-developer-hub-backstage-plugin-orchestrator-form-widgets:{{ "{{" }}inherit{{ "}}" }}'
    disabled: true

Reference reasoning: The existing patterns show plugin lists are intended to be merged with other plugin sets, which means list-item overrides are not guaranteed unless the merge logic explicitly de-duplicates by package. The codebase also uses a separate disabled package list approach, which avoids reliance on list-merge semantics to disable items.

📄 References
  1. redhat-developer/rhdh/dynamic-plugins.default.yaml [975-995]
  2. redhat-developer/rhdh/dynamic-plugins.default.yaml [996-1014]
  3. redhat-developer/rhdh/dynamic-plugins.default.yaml [794-804]
  4. redhat-developer/rhdh/default.packages.yaml [19-35]
  5. redhat-developer/rhdh/scripts/rhdh-openshift-setup/values.yaml [56-78]
  6. redhat-developer/rhdh/scripts/rhdh-openshift-setup/values.yaml [79-102]
  7. redhat-developer/rhdh-chart/charts/backstage/ci/with-orchestrator-and-dynamic-plugins-npmrc-values.yaml [1-24]
  8. redhat-developer/rhdh-chart/charts/backstage/templates/tests/test-secret.yaml [1-3]

@rhdh-qodo-merge
Copy link
Copy Markdown

PR Type

Enhancement, Bug fix


Description

  • Refactor pod/deployment selector logic to support both Helm and Operator deployments

    • Extract deployment-level label selectors into constants
    • Add helper functions to dynamically resolve pod selectors from deployment metadata
    • Update log retrieval to query deployments instead of hardcoded pod labels
  • Remove cluster health check retry logic that was causing test failures

  • Update plugin definitions to use {{inherit}} syntax instead of hardcoded version tags

  • Disable unused plugins (Tekton, Adoption Insights, Orchestrator, Scorecard) on K8s platforms to save disk space

  • Clean up CI pipeline scripts by removing redundant function calls and variables


File Walkthrough

Relevant files
Enhancement
4 files
log-utils.ts
Use deployment selector for pod log retrieval                       
+7/-3     
constants.ts
Add deployment-level label selector constants                       
+13/-0   
helper.ts
Add helper functions for deployment detection                       
+29/-5   
kube-client.ts
Refactor pod selector resolution from deployment metadata
+56/-15 
Miscellaneous
1 files
env_variables.sh
Remove unused Slack webhook URL variable                                 
+0/-1     
Bug fix
6 files
auth-providers.sh
Remove cluster health check call                                                 
+0/-1     
ocp-nightly.sh
Remove cluster health check call                                                 
+0/-1     
ocp-operator.sh
Remove duplicate prepare_operator call                                     
+0/-3     
ocp-pull.sh
Remove cluster health check call                                                 
+0/-1     
upgrade.sh
Remove cluster health check call                                                 
+0/-1     
utils.sh
Remove cluster readiness check function and fix inherit syntax
+1/-24   
Configuration changes
8 files
diff-values_showcase-rbac_AKS.yaml
Disable unused plugins on K8s RBAC tests                                 
+25/-1   
diff-values_showcase-rbac_EKS.yaml
Disable unused plugins on K8s RBAC tests                                 
+25/-1   
diff-values_showcase-rbac_GKE.yaml
Disable unused plugins on GKE RBAC tests                                 
+30/-2   
diff-values_showcase-sanity-plugins.yaml
Update plugin versions to use inherit syntax                         
+25/-25 
diff-values_showcase_AKS.yaml
Disable Tekton and Orchestrator on K8s                                     
+6/-1     
diff-values_showcase_EKS.yaml
Disable Tekton and Orchestrator on K8s                                     
+6/-1     
diff-values_showcase_GKE.yaml
Remove Dynamic Home Page and Scorecard plugins                     
+1/-11   
values_showcase.yaml
Remove exclamation marks from plugin definitions                 
+3/-3     

@rhdh-qodo-merge
Copy link
Copy Markdown

rhdh-qodo-merge bot commented Mar 24, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Re-evaluate removal of cluster readiness check

The suggestion warns that removing the wait_for_cluster_ready function from CI
scripts could cause intermittent failures. This function acted as a safeguard
for slow-starting clusters, and its removal might affect CI stability.

Examples:

.ci/pipelines/utils.sh [1427-1429]
  log::error "Failed to login to cluster after ${max_attempts} attempts"
  return 1
}
.ci/pipelines/jobs/ocp-pull.sh [14-15]
  oc_login
  log::info "OCP version: $(oc version)"

Solution Walkthrough:

Before:

# In .ci/pipelines/jobs/ocp-pull.sh
main() {
  ...
  oc_login
  wait_for_cluster_ready
  log::info "OCP version: $(oc version)"
  ...
}

# In .ci/pipelines/utils.sh
wait_for_cluster_ready() {
  log::info "Checking cluster API server readiness..."
  for ((i = 1; i <= max_attempts; i++)); do
    if oc get nodes && oc get route ...; then
      return 0
    fi
    sleep "$wait_seconds"
  done
  log::error "Cluster API server did not become ready..."
  return 1
}

After:

# In .ci/pipelines/jobs/ocp-pull.sh
main() {
  ...
  oc_login
  # The wait_for_cluster_ready call is removed.
  log::info "OCP version: $(oc version)"
  ...
}

# In .ci/pipelines/utils.sh
# The wait_for_cluster_ready function is completely removed.
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies the removal of a crucial cluster readiness check across multiple CI scripts, which could introduce flakiness and instability into the pipelines.

Medium
Possible issue
Restrict to single deployment

Modify the oc get deploy command to handle cases where multiple deployments
match the labels by limiting the result to the first match using head -n1 and
quoting the selector.

e2e-tests/playwright/e2e/audit-log/log-utils.ts [197-198]

-const deployTarget = `$(oc get deploy -n ${namespace} -l ${deploySelector} -o name)`;
+const deployTarget = `$(oc get deploy -n ${namespace} -l '${deploySelector}' -o name | head -n1)`;
 let grepCommand = `oc logs ${deployTarget} --tail=${tailNumber} -c backstage-backend -n ${namespace}`;
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies a potential failure case in the test utility where multiple matching deployments would break the command, and the proposed fix using head -n1 makes the code more robust.

Low
  • Update

@github-actions
Copy link
Copy Markdown
Contributor

Image was built and published successfully. It is available at:

@zdrapela zdrapela force-pushed the 1.9-del-excl-mark branch from 15b196b to 100d318 Compare March 24, 2026 14:12
@github-actions
Copy link
Copy Markdown
Contributor

The container image build and publish workflows were skipped (either due to [skip-build] tag or no relevant changes with existing image).

@zdrapela
Copy link
Copy Markdown
Member Author

/test e2e-aks-helm-nightly
/test e2e-eks-operator-nightly

@sonarqubecloud
Copy link
Copy Markdown

@zdrapela
Copy link
Copy Markdown
Member Author

/test e2e-ocp-operator-nightly

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 24, 2026

@zdrapela: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gke-helm-nightly 9e3209e link false /test e2e-gke-helm-nightly
ci/prow/e2e-aks-operator-nightly 9e3209e link false /test e2e-aks-operator-nightly
ci/prow/e2e-gke-operator-nightly 9e3209e link false /test e2e-gke-operator-nightly
ci/prow/e2e-eks-helm-nightly 15b196b link false /test e2e-eks-helm-nightly
ci/prow/e2e-ocp-operator-nightly 100d318 link false /test e2e-ocp-operator-nightly
ci/prow/e2e-eks-operator-nightly 100d318 link false /test e2e-eks-operator-nightly
ci/prow/e2e-aks-helm-nightly 100d318 link false /test e2e-aks-helm-nightly

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci openshift-ci bot added the lgtm label Mar 25, 2026
@openshift-merge-bot openshift-merge-bot bot merged commit d9d0696 into redhat-developer:release-1.9 Mar 25, 2026
15 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants