Skip to content

Add PerfCI e2e pipeline for benchmark-runner (stages 01-05)#79707

Open
arpsharm wants to merge 3 commits into
openshift:mainfrom
arpsharm:benchmark-runner-e2e-pipeline
Open

Add PerfCI e2e pipeline for benchmark-runner (stages 01-05)#79707
arpsharm wants to merge 3 commits into
openshift:mainfrom
arpsharm:benchmark-runner-e2e-pipeline

Conversation

@arpsharm
Copy link
Copy Markdown
Contributor

@arpsharm arpsharm commented May 26, 2026

Summary

  • Rename redhat-performance-benchmark-runner step to redhat-performance-benchmark-runner-03-workloads
  • Add 4 new steps: 01-ocp-deploy (JetLag), 02-operators-deploy (CNV/LSO/ODF), 04-grafana, 05-backup
  • Add redhat-performance-benchmark-runner-e2e workflow chaining 01-02-03-04-05
  • Add periodic config: weekly-e2e runs Saturday 8am UTC
  • All new steps are always_run: false, only triggered explicitly

Summary by CodeRabbit

This PR updates OpenShift CI configuration in the openshift/release repository for the redhat-performance/benchmark-runner jobset to add a PerfCI end-to-end (e2e) pipeline and split the existing workload step into numbered stages.

What changed in practical terms

  • Repository affected: ci-operator configuration and step-registry for redhat-performance/benchmark-runner (CI infra only).
  • Renamed the existing workload step to a numeric stage: redhat-performance-benchmark-runner → redhat-performance-benchmark-runner-03-workloads. The workload step was refactored to prefer shared-directory cluster credentials (shared kubeconfig/kubeadmin_password) and fall back to fetching from the provisioner or secret-backed files.
  • Added four new ordered CI steps with refs, metadata and OWNERS entries:
    • 01-ocp-deploy: JetLag-based bare-metal OCP deployment on a provisioner host via SSH; includes cleanup and retry logic and copies kubeconfig/kubeadmin_password into ${SHARED_DIR}.
    • 02-operators-deploy: Sequential operator installation (CNV/LSO/ODF/Infra/Custom) on the provisioned cluster via SSH; supports applying a CNV nightly catalog when provided.
    • 04-grafana: Remote workflow that clones the benchmark-runner repo, regenerates main.libsonnet and Grafana dashboard JSON (grafonnet), commits/pushes changes when artifacts change, and uploads dashboards to Grafana.
    • 05-backup: Runs deployment/report/backup scripts remotely and performs best-effort podman cleanup on the provisioner.
  • New workflow: redhat-performance-benchmark-runner-e2e that composes stages 01 → 02 → 03 → 04 → 05 into a single e2e pipeline (pre/test/post style), passing shared cluster credentials from the deploy step to downstream steps.
  • Periodic job: adds weekly-e2e periodic scheduled for Saturdays at 08:00 UTC to run the e2e workflow.
  • Execution/triggering: All new steps are not always_run and are only triggered explicitly by the workflow or the periodic. Steps mount credentials from test-credentials/benchmark-runner-perfci-config, request minimal resources (100m CPU, 200Mi memory), and have per-step timeouts/grace periods.
  • Governance: Added/updated OWNERS and metadata files across the new step refs (consistent approvers/reviewers set).

Notable implementation details and review points

  • Step scripts perform runtime SSH interactions with a provisioner and explicitly copy cluster credentials into ${SHARED_DIR} for downstream consumption—reviewers should validate credential handling, permissions, secret-mount usage, and cleanup behavior.
  • The workloads step now prefers shared-directory credentials and will attempt an SSH fetch from the provisioner if needed; this credential precedence and error handling should be validated for typical CI and periodic runs.
  • The Grafana step runs a remote, strict-shell workflow that relies on several environment variables (e.g., GIT_BRANCH, MAIN_LIBSONNET_PATH, GRAFANA_JSON_PATH, GRAFANA_FOLDER_NAME) being set; missing values under set -u could cause failures—confirm defaults or injection points.
  • Several new scripts include retry loops, podman container runs on the provisioner, SCP/SSH orchestration, and remote git pushes—these are non-trivial and warrant careful review for robustness, idempotency, and security (token usage, secret exposure).
  • Timeouts and resource requests are conservative but should be validated against real-world run durations and resource needs.

Overall impact

  • This PR extends the benchmark-runner CI surface to support a full, orchestrated PerfCI e2e pipeline (deploy → operators → workloads → dashboard update → backup), enables scheduled weekly runs and on-demand orchestration, and centralizes passing of cluster credentials between sequential stages while adding ownership metadata for the new steps.

Rename benchmark-runner step to 03-workloads for consistent numbering.
Add steps 01-ocp-deploy, 02-operators-deploy, 04-grafana, 05-backup.
Add e2e workflow chaining all 5 stages (pre/test/post).
Add periodic config: weekly-e2e Saturday 8am UTC.
Add individual step tests to test-step config for PR-triggered validation.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 26, 2026

Walkthrough

Splits benchmark-runner CI into five sequential step refs (01–05) with scripts/refs/metadata/OWNERS, adds an e2e workflow composing them, and updates tests/periodics to run the new workflow weekly against OCP 4.22 nightlies.

Changes

Benchmark Runner E2E Pipeline

Layer / File(s) Summary
Periodics & test-step rewiring
ci-operator/config/redhat-performance/benchmark-runner/*
Adds OCP 4.22 nightly periodics and a weekly-e2e schedule; rewires many tests to redhat-performance-benchmark-runner-03-workloads and appends explicit 01-ocp-deploy, 02-operators-deploy, 04-grafana, 05-backup steps plus an e2e workflow test.
Top-level OWNERS
ci-operator/step-registry/redhat-performance/OWNERS
Adds ebattat to approvers and reviewers for the redhat-performance step-registry scope.
Step 01 — OCP deploy
01-ocp-deploy/*
New JetLag-based deploy script with retries and provisioner pre-clean; exports kubeconfig/kubeadmin-password to SHARED_DIR; adds step ref, metadata, and OWNERS.
Step 02 — Operators deploy
02-operators-deploy/*
New step to optionally apply CNV nightly catalog and sequentially install operators remotely via benchmark-runner container; adds step ref, metadata, and OWNERS.
Step 03 — Workloads
03-workloads/*
Workload commands updated to prefer credentials from SHARED_DIR, fall back to scp/Vault, conditionally start SOCKS proxy; ref renamed to 03-workloads with metadata and OWNERS.
Step 04 — Grafana update
04-grafana/*
Remote workflow clones repo, updates main.libsonnet, regenerates dashboards via grafonnet, conditionally uploads to Grafana, and commits/pushes changes; adds step ref, metadata, and OWNERS.
Step 05 — Backup & report
05-backup/*
Runs deployment/report-generation and backup scripts on provisioner when present, performs best-effort podman cleanup; adds step ref, metadata, and OWNERS.
E2E workflow
e2e/*
New e2e workflow composing 0102030405 into pre/test/post phases with documented credential handoff and updated workflow metadata.
sequenceDiagram
  actor CI_Workflow
  participant Provisioner
  participant BenchmarkRunner as "benchmark-runner container"
  participant SharedDir as "SHARED_DIR"
  participant Grafana
  participant Git

  CI_Workflow->>Provisioner: SSH run Step01 (JetLag deploy)
  Provisioner->>BenchmarkRunner: podman run deploy (JetLag)
  BenchmarkRunner->>Provisioner: perform install, write kubeconfig+kubeadmin-password
  Provisioner->>SharedDir: scp kubeconfig + kubeadmin-password
  CI_Workflow->>Provisioner: SSH run Step02 (operators via benchmark-runner)
  CI_Workflow->>Provisioner: SSH run Step03 (workloads) using creds from SharedDir
  CI_Workflow->>Provisioner: SSH run Step04 (generate/send dashboards)
  Provisioner->>Grafana: update dashboard via API
  Provisioner->>Git: commit & push dashboard/libsonnet changes
  CI_Workflow->>Provisioner: SSH run Step05 (backup/report)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

lgtm, approved, rehearsals-ack

Suggested reviewers

  • ebattat
🚥 Pre-merge checks | ✅ 12
✅ Passed checks (12 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately and clearly describes the main change: adding a complete e2e pipeline for benchmark-runner with stages 01-05, which is the primary objective of this changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR contains only CI configuration YAML, shell scripts, and metadata files - no Ginkgo test code with It(), Describe(), Context() declarations. Custom check for test name stability is not applicable.
Test Structure And Quality ✅ Passed This PR contains no Ginkgo test code—only YAML configs, shell scripts, and JSON metadata. The test structure check is not applicable to CI infrastructure changes.
Microshift Test Compatibility ✅ Passed This PR adds no Ginkgo e2e test code. Changes are only CI/CD infrastructure (YAML config, shell scripts, metadata). The MicroShift compatibility check is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR adds CI/CD pipeline configuration but no Ginkgo e2e tests. SNO compatibility check applies only to Ginkgo tests (It, Describe, Context, When), which are not present in this PR.
Topology-Aware Scheduling Compatibility ✅ Passed PR adds only CI/CD configuration and test orchestration scripts, not deployment manifests or operator code; no pod scheduling constraints are introduced.
Ote Binary Stdout Contract ✅ Passed PR contains no Go code (only YAML config, shell scripts, JSON metadata). OTE Binary Stdout Contract applies only to Go test binaries; not applicable here.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR does not add any Ginkgo e2e tests; custom check only applies to new test code (It/Describe/Context). This PR adds only CI/CD YAML configuration, shell scripts, and metadata files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 26, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 26, 2026

Hi @arpsharm. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci Bot requested review from ebattat and jeniferh May 26, 2026 13:09
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 26, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: arpsharm
Once this PR has been reviewed and has the lgtm label, please assign pmacik, robertkrawitz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

@ebattat ebattat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok-to-test

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
ci-operator/config/redhat-performance/benchmark-runner/redhat-performance-benchmark-runner-main__periodics.yaml (1)

1-27: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add the required periodic variant sections.

Line 1 starts a periodic variant config that omits base_images, build_root, images, and promotion. This violates the periodic variant structure rule and can break expected CI config consistency.

As per coding guidelines: “Variant periodic configuration files should include base_images, build_root, images, promotion, and releases sections copied from main config, with only tests: entries containing interval: or cron: scheduling”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/config/redhat-performance/benchmark-runner/redhat-performance-benchmark-runner-main__periodics.yaml`
around lines 1 - 27, The periodic variant YAML is missing required sections: add
back the copied sections base_images, build_root, images, and promotion
(alongside the existing releases) so the file follows the variant periodic
structure; keep the existing tests block (with cron/interval) and
zz_generated_metadata as-is, and ensure the workflow name
redhat-performance-benchmark-runner-e2e remains under tests — copy those
sections from the main config into this periodic variant to restore consistency
and include only schedule changes in tests.
ci-operator/step-registry/redhat-performance/benchmark-runner/03-workloads/redhat-performance-benchmark-runner-03-workloads-commands.sh (1)

53-53: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update stale kubeconfig error message.

Line 53 now misleads operators because kubeconfig is accepted from both SHARED_DIR and /secret.

Proposed fix
-  echo "ERROR: no kubeconfig found at /secret/kubeconfig" >&2
+  echo "ERROR: no kubeconfig found at \${SHARED_DIR:-}/kubeconfig or /secret/kubeconfig" >&2
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/redhat-performance/benchmark-runner/03-workloads/redhat-performance-benchmark-runner-03-workloads-commands.sh`
at line 53, The error message echo "ERROR: no kubeconfig found at
/secret/kubeconfig" is stale — update it to reflect that kubeconfig may be
provided from either SHARED_DIR or /secret; change the message produced by that
echo (the literal string "/secret/kubeconfig") to mention both locations (for
example reference "$SHARED_DIR" and "/secret/kubeconfig") and keep the stderr
redirection (>&2) so operators see the correct, non-misleading path info.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/redhat-performance/benchmark-runner/01-ocp-deploy/redhat-performance-benchmark-runner-01-ocp-deploy-commands.sh`:
- Around line 24-26: The script currently echoes the sensitive provisioner
endpoint via the PROVISION_IP variable in the OCP deployment step; remove or
mask that value to avoid leaking secret data. Update the echo that references
PROVISION_IP (the line printing "Provisioner: ${PROVISION_IP}") to either omit
the variable entirely or print a redacted placeholder (e.g., "Provisioner:
[REDACTED]") and ensure no other prints reference PROVISION_IP so the secret
from /secret/bastion_address is never logged.

In
`@ci-operator/step-registry/redhat-performance/benchmark-runner/02-operators-deploy/redhat-performance-benchmark-runner-02-operators-deploy-commands.sh`:
- Around line 53-57: The heredoc currently performs client-side expansion of
${CNV_VERSION} with an unquoted heredoc and runs sed locally, which is fragile;
instead validate CNV_VERSION (e.g., allow only characters matching
^[A-Za-z0-9._-]+) and avoid client-side interpolation by moving the replacement
into the remote session or using a single-quoted heredoc so the remote shell
does the substitution safely; specifically, check/normalize the CNV_VERSION
variable in the script, then in the ssh block run sed on the remote host
(referencing the existing ssh command and the sed invocation "sed -i \"s/{{
cnv_version }}/${CNV_VERSION}/g\" /tmp/catalog_source.yaml") or use a
remote-aware substitution like printf '%s' "$CNV_VERSION" > /tmp/cnv_version &&
sed -i "s/{{ cnv_version }}/$(cat /tmp/cnv_version)/g" /tmp/catalog_source.yaml
so interpolation happens on the remote side and only validated content is used.

In
`@ci-operator/step-registry/redhat-performance/benchmark-runner/03-workloads/redhat-performance-benchmark-runner-03-workloads-commands.sh`:
- Around line 18-22: The SHARED_DIR branch copies SHARED_DIR/kubeconfig to
/tmp/kubeconfig and sets KUBECONFIG but skips the SOCKS/API reachability/proxy
bootstrap logic that runs in the /secret/kubeconfig branch, which breaks private
VLAN runs; modify the script so the proxy bootstrap and API/SOCKS reachability
setup is executed for both cases — either move the common proxy/bootstrap
commands out of the /secret/kubeconfig-only branch into a shared section run
after KUBECONFIG is set, or call the same setup routine from the SHARED_DIR
branch; look for the SHARED_DIR check, the /secret/kubeconfig branch, and
references to KUBECONFIG and the proxy/API reachability commands and ensure they
are invoked for SHARED_DIR-derived kubeconfigs too.

In
`@ci-operator/step-registry/redhat-performance/benchmark-runner/04-grafana/redhat-performance-benchmark-runner-04-grafana-commands.sh`:
- Around line 23-38: The script currently reads ELASTICSEARCH, ES_PORT,
GRAFANA_URL, GRAFANA_API_KEY, GIT_TOKEN, GIT_REPOSITORY, GIT_EMAIL and
GIT_USERNAME as optional but later assumes they exist; add a fail-fast
validation block (before any SSH/remote operations) that checks each required
variable (ELASTICSEARCH, ES_PORT, GRAFANA_URL, GRAFANA_API_KEY, GIT_TOKEN,
GIT_REPOSITORY, GIT_EMAIL, GIT_USERNAME) with a test like -z and prints a clear
error identifying the missing variable, then exits non‑zero so the script stops
immediately on missing inputs. Ensure the error messages reference the variable
names exactly so failures are easy to diagnose.

In
`@ci-operator/step-registry/redhat-performance/benchmark-runner/05-backup/redhat-performance-benchmark-runner-05-backup-commands.sh`:
- Around line 21-45: Heredocs are unquoted so ${DEPLOYMENT_POD_SCRIPT} and
${BACKUP_SCRIPT} are expanded locally and can be injected; change the heredoc
delimiters to quoted forms (use <<'DEPLOY_EOF' and <<'BACKUP_EOF') to prevent
local expansion, and inside the remote blocks update the invocations to use
quoted expansions on the remote side (invoke "${DEPLOYMENT_POD_SCRIPT}" and
"${BACKUP_SCRIPT}" instead of unquoted
${DEPLOYMENT_POD_SCRIPT}/${BACKUP_SCRIPT}) so whitespace/metacharacters are
handled safely; keep the existing set -euo pipefail and the if [[ -x ... ]]
checks in the DEPLOY_EOF and BACKUP_EOF blocks.

---

Outside diff comments:
In
`@ci-operator/config/redhat-performance/benchmark-runner/redhat-performance-benchmark-runner-main__periodics.yaml`:
- Around line 1-27: The periodic variant YAML is missing required sections: add
back the copied sections base_images, build_root, images, and promotion
(alongside the existing releases) so the file follows the variant periodic
structure; keep the existing tests block (with cron/interval) and
zz_generated_metadata as-is, and ensure the workflow name
redhat-performance-benchmark-runner-e2e remains under tests — copy those
sections from the main config into this periodic variant to restore consistency
and include only schedule changes in tests.

In
`@ci-operator/step-registry/redhat-performance/benchmark-runner/03-workloads/redhat-performance-benchmark-runner-03-workloads-commands.sh`:
- Line 53: The error message echo "ERROR: no kubeconfig found at
/secret/kubeconfig" is stale — update it to reflect that kubeconfig may be
provided from either SHARED_DIR or /secret; change the message produced by that
echo (the literal string "/secret/kubeconfig") to mention both locations (for
example reference "$SHARED_DIR" and "/secret/kubeconfig") and keep the stderr
redirection (>&2) so operators see the correct, non-misleading path info.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 77068b8a-8549-451d-9e91-3b99c20f47db

📥 Commits

Reviewing files that changed from the base of the PR and between ecc3622 and 24bf9fa.

⛔ Files ignored due to path filters (2)
  • ci-operator/jobs/redhat-performance/benchmark-runner/redhat-performance-benchmark-runner-main-periodics.yaml is excluded by !ci-operator/jobs/**
  • ci-operator/jobs/redhat-performance/benchmark-runner/redhat-performance-benchmark-runner-main-presubmits.yaml is excluded by !ci-operator/jobs/**
📒 Files selected for processing (26)
  • ci-operator/config/redhat-performance/benchmark-runner/redhat-performance-benchmark-runner-main__periodics.yaml
  • ci-operator/config/redhat-performance/benchmark-runner/redhat-performance-benchmark-runner-main__test-step.yaml
  • ci-operator/step-registry/redhat-performance/OWNERS
  • ci-operator/step-registry/redhat-performance/benchmark-runner/01-ocp-deploy/OWNERS
  • ci-operator/step-registry/redhat-performance/benchmark-runner/01-ocp-deploy/redhat-performance-benchmark-runner-01-ocp-deploy-commands.sh
  • ci-operator/step-registry/redhat-performance/benchmark-runner/01-ocp-deploy/redhat-performance-benchmark-runner-01-ocp-deploy-ref.metadata.json
  • ci-operator/step-registry/redhat-performance/benchmark-runner/01-ocp-deploy/redhat-performance-benchmark-runner-01-ocp-deploy-ref.yaml
  • ci-operator/step-registry/redhat-performance/benchmark-runner/02-operators-deploy/OWNERS
  • ci-operator/step-registry/redhat-performance/benchmark-runner/02-operators-deploy/redhat-performance-benchmark-runner-02-operators-deploy-commands.sh
  • ci-operator/step-registry/redhat-performance/benchmark-runner/02-operators-deploy/redhat-performance-benchmark-runner-02-operators-deploy-ref.metadata.json
  • ci-operator/step-registry/redhat-performance/benchmark-runner/02-operators-deploy/redhat-performance-benchmark-runner-02-operators-deploy-ref.yaml
  • ci-operator/step-registry/redhat-performance/benchmark-runner/03-workloads/OWNERS
  • ci-operator/step-registry/redhat-performance/benchmark-runner/03-workloads/redhat-performance-benchmark-runner-03-workloads-commands.sh
  • ci-operator/step-registry/redhat-performance/benchmark-runner/03-workloads/redhat-performance-benchmark-runner-03-workloads-ref.metadata.json
  • ci-operator/step-registry/redhat-performance/benchmark-runner/03-workloads/redhat-performance-benchmark-runner-03-workloads-ref.yaml
  • ci-operator/step-registry/redhat-performance/benchmark-runner/04-grafana/OWNERS
  • ci-operator/step-registry/redhat-performance/benchmark-runner/04-grafana/redhat-performance-benchmark-runner-04-grafana-commands.sh
  • ci-operator/step-registry/redhat-performance/benchmark-runner/04-grafana/redhat-performance-benchmark-runner-04-grafana-ref.metadata.json
  • ci-operator/step-registry/redhat-performance/benchmark-runner/04-grafana/redhat-performance-benchmark-runner-04-grafana-ref.yaml
  • ci-operator/step-registry/redhat-performance/benchmark-runner/05-backup/OWNERS
  • ci-operator/step-registry/redhat-performance/benchmark-runner/05-backup/redhat-performance-benchmark-runner-05-backup-commands.sh
  • ci-operator/step-registry/redhat-performance/benchmark-runner/05-backup/redhat-performance-benchmark-runner-05-backup-ref.metadata.json
  • ci-operator/step-registry/redhat-performance/benchmark-runner/05-backup/redhat-performance-benchmark-runner-05-backup-ref.yaml
  • ci-operator/step-registry/redhat-performance/benchmark-runner/e2e/OWNERS
  • ci-operator/step-registry/redhat-performance/benchmark-runner/e2e/redhat-performance-benchmark-runner-e2e-workflow.metadata.json
  • ci-operator/step-registry/redhat-performance/benchmark-runner/e2e/redhat-performance-benchmark-runner-e2e-workflow.yaml

Copy link
Copy Markdown
Contributor

@ebattat ebattat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

@openshift-ci openshift-ci Bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 26, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
ci-operator/step-registry/redhat-performance/benchmark-runner/04-grafana/redhat-performance-benchmark-runner-04-grafana-commands.sh (1)

40-53: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate the non-secret step inputs used by the SSH block.

This block now checks the Vault secrets, but GIT_BRANCH, MAIN_LIBSONNET_PATH, GRAFANA_JSON_PATH, and GRAFANA_FOLDER_NAME are still expanded locally on Line 59. If any one is missing from the step env, set -u kills the job before the SSH session starts.

💡 Suggested fix
 [[ -z "${GIT_TOKEN}" ]] && MISSING="${MISSING} git_token"
 [[ -z "${GIT_REPOSITORY}" ]] && MISSING="${MISSING} git_repository"
 [[ -z "${GIT_EMAIL}" ]] && MISSING="${MISSING} git_email"
 [[ -z "${GIT_USERNAME}" ]] && MISSING="${MISSING} git_username"
+[[ -z "${GIT_BRANCH:-}" ]] && MISSING="${MISSING} GIT_BRANCH"
+[[ -z "${MAIN_LIBSONNET_PATH:-}" ]] && MISSING="${MISSING} MAIN_LIBSONNET_PATH"
+[[ -z "${GRAFANA_JSON_PATH:-}" ]] && MISSING="${MISSING} GRAFANA_JSON_PATH"
+[[ -z "${GRAFANA_FOLDER_NAME:-}" ]] && MISSING="${MISSING} GRAFANA_FOLDER_NAME"
 if [[ -n "${MISSING}" ]]; then
   echo "ERROR: missing required Vault secrets:${MISSING}" >&2
   exit 1
 fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/redhat-performance/benchmark-runner/04-grafana/redhat-performance-benchmark-runner-04-grafana-commands.sh`
around lines 40 - 53, Add validation for the non-secret environment variables
that are expanded locally before the SSH block — check GIT_BRANCH,
MAIN_LIBSONNET_PATH, GRAFANA_JSON_PATH, and GRAFANA_FOLDER_NAME just like the
existing Vault secret checks (append missing names to MISSING and error/exit if
non-empty) so that set -u won't cause the job to die before the SSH session;
update the same validation block that currently references ELASTICSEARCH,
ES_PORT, GRAFANA_URL, GRAFANA_API_KEY, GIT_TOKEN, GIT_REPOSITORY, GIT_EMAIL and
GIT_USERNAME to include these four variable names.
ci-operator/step-registry/redhat-performance/benchmark-runner/05-backup/redhat-performance-benchmark-runner-05-backup-commands.sh (1)

27-27: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Stop interpolating the remote script paths on the client.

Lines 27 and 31 still splice ${DEPLOYMENT_POD_SCRIPT} and ${BACKUP_SCRIPT} into a client-side shell string. A quote or whitespace in either path can change the remote command or make it fail to parse.

💡 Suggested fix
-ssh ${SSH_ARGS} root@"${PROVISION_IP}" "set -euo pipefail; if [[ -x '${DEPLOYMENT_POD_SCRIPT}' ]]; then '${DEPLOYMENT_POD_SCRIPT}'; else echo 'WARNING: deployment pod script not found' >&2; fi"
+ssh ${SSH_ARGS} root@"${PROVISION_IP}" bash -s -- "${DEPLOYMENT_POD_SCRIPT}" <<'DEPLOY_EOF'
+set -euo pipefail
+deployment_pod_script="$1"
+if [[ -x "${deployment_pod_script}" ]]; then
+  "${deployment_pod_script}"
+else
+  echo "WARNING: deployment pod script not found" >&2
+fi
+DEPLOY_EOF
@@
-ssh ${SSH_ARGS} root@"${PROVISION_IP}" "set -euo pipefail; if [[ -x '${BACKUP_SCRIPT}' ]]; then '${BACKUP_SCRIPT}'; else echo 'WARNING: backup script not found' >&2; fi"
+ssh ${SSH_ARGS} root@"${PROVISION_IP}" bash -s -- "${BACKUP_SCRIPT}" <<'BACKUP_EOF'
+set -euo pipefail
+backup_script="$1"
+if [[ -x "${backup_script}" ]]; then
+  "${backup_script}"
+else
+  echo "WARNING: backup script not found" >&2
+fi
+BACKUP_EOF

Also applies to: 31-31

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/redhat-performance/benchmark-runner/05-backup/redhat-performance-benchmark-runner-05-backup-commands.sh`
at line 27, The client currently interpolates ${DEPLOYMENT_POD_SCRIPT} and
${BACKUP_SCRIPT} into the SSH command string which breaks if those paths contain
quotes/spaces; instead, stop expanding those variables on the client and pass
them as separate arguments to the remote shell (e.g., send a fixed remote script
that references a positional parameter like "$1" and then pass the local
${DEPLOYMENT_POD_SCRIPT} or ${BACKUP_SCRIPT} as an extra ssh argument). Update
both invocations that use SSH_ARGS/PROVISION_IP so the remote command uses a
safe quoted reference (e.g., if [[ -x "$1" ]]; then "$1"; else echo ... >&2; fi)
and supply the corresponding local variable as the positional argument to ssh.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/redhat-performance/benchmark-runner/03-workloads/redhat-performance-benchmark-runner-03-workloads-commands.sh`:
- Around line 65-71: The script currently reads ES_PORT unguarded
(ES_PORT=$(<"/secret/elasticsearch_port")) inside the bastion SOCKS proxy block,
which under set -e will abort with a file-read error if the secret isn't
mounted; before attempting to open the bastion tunnel (the ssh line that uses
SOCKS_PORT and ES_PORT), add a guarded check that the secret file
"/secret/elasticsearch_port" exists and is non-empty, read it into ES_PORT only
after the check, trim newlines (as done now), and if missing or empty emit a
clear fatal message and exit non-zero so the failure is explicit rather than a
cryptic read error.

---

Duplicate comments:
In
`@ci-operator/step-registry/redhat-performance/benchmark-runner/04-grafana/redhat-performance-benchmark-runner-04-grafana-commands.sh`:
- Around line 40-53: Add validation for the non-secret environment variables
that are expanded locally before the SSH block — check GIT_BRANCH,
MAIN_LIBSONNET_PATH, GRAFANA_JSON_PATH, and GRAFANA_FOLDER_NAME just like the
existing Vault secret checks (append missing names to MISSING and error/exit if
non-empty) so that set -u won't cause the job to die before the SSH session;
update the same validation block that currently references ELASTICSEARCH,
ES_PORT, GRAFANA_URL, GRAFANA_API_KEY, GIT_TOKEN, GIT_REPOSITORY, GIT_EMAIL and
GIT_USERNAME to include these four variable names.

In
`@ci-operator/step-registry/redhat-performance/benchmark-runner/05-backup/redhat-performance-benchmark-runner-05-backup-commands.sh`:
- Line 27: The client currently interpolates ${DEPLOYMENT_POD_SCRIPT} and
${BACKUP_SCRIPT} into the SSH command string which breaks if those paths contain
quotes/spaces; instead, stop expanding those variables on the client and pass
them as separate arguments to the remote shell (e.g., send a fixed remote script
that references a positional parameter like "$1" and then pass the local
${DEPLOYMENT_POD_SCRIPT} or ${BACKUP_SCRIPT} as an extra ssh argument). Update
both invocations that use SSH_ARGS/PROVISION_IP so the remote command uses a
safe quoted reference (e.g., if [[ -x "$1" ]]; then "$1"; else echo ... >&2; fi)
and supply the corresponding local variable as the positional argument to ssh.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 8d85b206-cb71-4aad-9b15-b5d844f76514

📥 Commits

Reviewing files that changed from the base of the PR and between 24bf9fa and 1351577.

📒 Files selected for processing (5)
  • ci-operator/step-registry/redhat-performance/benchmark-runner/01-ocp-deploy/redhat-performance-benchmark-runner-01-ocp-deploy-commands.sh
  • ci-operator/step-registry/redhat-performance/benchmark-runner/02-operators-deploy/redhat-performance-benchmark-runner-02-operators-deploy-commands.sh
  • ci-operator/step-registry/redhat-performance/benchmark-runner/03-workloads/redhat-performance-benchmark-runner-03-workloads-commands.sh
  • ci-operator/step-registry/redhat-performance/benchmark-runner/04-grafana/redhat-performance-benchmark-runner-04-grafana-commands.sh
  • ci-operator/step-registry/redhat-performance/benchmark-runner/05-backup/redhat-performance-benchmark-runner-05-backup-commands.sh
💤 Files with no reviewable changes (1)
  • ci-operator/step-registry/redhat-performance/benchmark-runner/01-ocp-deploy/redhat-performance-benchmark-runner-01-ocp-deploy-commands.sh

Comment on lines +65 to +71
if [[ -n "${BASTION_ADDRESS:-}" ]]; then
if ! curl -sk --max-time 5 "${CLUSTER_SERVER}/version" &>/dev/null; then
echo "API not directly reachable, starting SOCKS proxy through bastion"
SOCKS_PORT=$((RANDOM % 55536 + 10000))
ES_PORT=$(<"/secret/elasticsearch_port")
ES_PORT="${ES_PORT%$'\n'}"
ssh ${SSH_ARGS} root@"${BASTION_ADDRESS}" -fNT -D "${SOCKS_PORT}" -L "127.0.0.1:${ES_PORT}:localhost:${ES_PORT}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard /secret/elasticsearch_port before opening the bastion ES tunnel (line 69-71)

ES_PORT=$(<"/secret/elasticsearch_port") is unconditionally evaluated under set -e, so if the secret/volume isn’t mounted when the bastion SOCKS proxy is started, the script will abort with a confusing file-read error. Add a size check and fail fast with a clear message.

💡 Suggested fix
 if [[ -n "${BASTION_ADDRESS:-}" ]]; then
   if ! curl -sk --max-time 5 "${CLUSTER_SERVER}/version" &amp;&gt;/dev/null; then
     echo "API not directly reachable, starting SOCKS proxy through bastion"
     SOCKS_PORT=$((RANDOM % 55536 + 10000))
+    if [[ ! -s /secret/elasticsearch_port ]]; then
+      echo "ERROR: /secret/elasticsearch_port is required when opening the ES tunnel" &gt;&amp;2
+      exit 1
+    fi
     ES_PORT=$(&lt;"/secret/elasticsearch_port")
     ES_PORT="${ES_PORT%$'\n'}"
     ssh ${SSH_ARGS} root@"${BASTION_ADDRESS}" -fNT -D "${SOCKS_PORT}" -L "127.0.0.1:${ES_PORT}:localhost:${ES_PORT}"
🧰 Tools
🪛 Shellcheck (0.11.0)

[info] 71-71: Double quote to prevent globbing and word splitting.

(SC2086)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/redhat-performance/benchmark-runner/03-workloads/redhat-performance-benchmark-runner-03-workloads-commands.sh`
around lines 65 - 71, The script currently reads ES_PORT unguarded
(ES_PORT=$(<"/secret/elasticsearch_port")) inside the bastion SOCKS proxy block,
which under set -e will abort with a file-read error if the secret isn't
mounted; before attempting to open the bastion tunnel (the ssh line that uses
SOCKS_PORT and ES_PORT), add a guarded check that the secret file
"/secret/elasticsearch_port" exists and is non-empty, read it into ES_PORT only
after the check, trim newlines (as done now), and if missing or empty emit a
clear fatal message and exit non-zero so the failure is explicit rather than a
cryptic read error.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@arpsharm: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-redhat-performance-benchmark-runner-main-test-step-01-ocp-deploy redhat-performance/benchmark-runner presubmit Presubmit changed
pull-ci-redhat-performance-benchmark-runner-main-test-step-02-operators-deploy redhat-performance/benchmark-runner presubmit Presubmit changed
pull-ci-redhat-performance-benchmark-runner-main-test-step-04-grafana redhat-performance/benchmark-runner presubmit Presubmit changed
pull-ci-redhat-performance-benchmark-runner-main-test-step-05-backup redhat-performance/benchmark-runner presubmit Presubmit changed
pull-ci-redhat-performance-benchmark-runner-main-test-step-e2e redhat-performance/benchmark-runner presubmit Presubmit changed
pull-ci-redhat-performance-benchmark-runner-main-test-step-all redhat-performance/benchmark-runner presubmit Ci-operator config changed
pull-ci-redhat-performance-benchmark-runner-main-test-step-bootstorm-vm redhat-performance/benchmark-runner presubmit Ci-operator config changed
pull-ci-redhat-performance-benchmark-runner-main-test-step-bootstorm-vm-scale redhat-performance/benchmark-runner presubmit Ci-operator config changed
pull-ci-redhat-performance-benchmark-runner-main-test-step-fio-pod redhat-performance/benchmark-runner presubmit Ci-operator config changed
pull-ci-redhat-performance-benchmark-runner-main-test-step-fio-pod-scale redhat-performance/benchmark-runner presubmit Ci-operator config changed
pull-ci-redhat-performance-benchmark-runner-main-test-step-fio-vm redhat-performance/benchmark-runner presubmit Ci-operator config changed
pull-ci-redhat-performance-benchmark-runner-main-test-step-fio-vm-scale redhat-performance/benchmark-runner presubmit Ci-operator config changed
pull-ci-redhat-performance-benchmark-runner-main-test-step-hammerdb-pod-mariadb redhat-performance/benchmark-runner presubmit Ci-operator config changed
pull-ci-redhat-performance-benchmark-runner-main-test-step-hammerdb-pod-mariadb-lso redhat-performance/benchmark-runner presubmit Ci-operator config changed
pull-ci-redhat-performance-benchmark-runner-main-test-step-hammerdb-pod-mssql redhat-performance/benchmark-runner presubmit Ci-operator config changed
pull-ci-redhat-performance-benchmark-runner-main-test-step-hammerdb-pod-mssql-lso redhat-performance/benchmark-runner presubmit Ci-operator config changed
pull-ci-redhat-performance-benchmark-runner-main-test-step-hammerdb-pod-postgres redhat-performance/benchmark-runner presubmit Ci-operator config changed
pull-ci-redhat-performance-benchmark-runner-main-test-step-hammerdb-pod-postgres-lso redhat-performance/benchmark-runner presubmit Ci-operator config changed
pull-ci-redhat-performance-benchmark-runner-main-test-step-hammerdb-vm-mariadb redhat-performance/benchmark-runner presubmit Ci-operator config changed
pull-ci-redhat-performance-benchmark-runner-main-test-step-hammerdb-vm-mariadb-lso redhat-performance/benchmark-runner presubmit Ci-operator config changed
pull-ci-redhat-performance-benchmark-runner-main-test-step-hammerdb-vm-mssql redhat-performance/benchmark-runner presubmit Ci-operator config changed
pull-ci-redhat-performance-benchmark-runner-main-test-step-hammerdb-vm-mssql-lso redhat-performance/benchmark-runner presubmit Ci-operator config changed
pull-ci-redhat-performance-benchmark-runner-main-test-step-hammerdb-vm-postgres redhat-performance/benchmark-runner presubmit Ci-operator config changed
pull-ci-redhat-performance-benchmark-runner-main-test-step-hammerdb-vm-postgres-lso redhat-performance/benchmark-runner presubmit Ci-operator config changed
pull-ci-redhat-performance-benchmark-runner-main-test-step-sysbench-pod redhat-performance/benchmark-runner presubmit Ci-operator config changed

A total of 37 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs.

A full list of affected jobs can be found here

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/redhat-performance/benchmark-runner/04-grafana/redhat-performance-benchmark-runner-04-grafana-commands.sh`:
- Line 80: The git clone command currently hard-codes
"https://github.com/redhat-performance/benchmark-runner" instead of using the
validated GIT_REPOSITORY variable; update the clone invocation in the script
(the git clone line that uses GIT_BRANCH) to clone from "${GIT_REPOSITORY}" with
--depth 1 -b "${GIT_BRANCH}" so the step uses the configured/secret repository
rather than the fixed GitHub URL.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 3c368ca5-f7aa-44cd-a988-a269318f70d9

📥 Commits

Reviewing files that changed from the base of the PR and between 1351577 and 5f5784f.

📒 Files selected for processing (2)
  • ci-operator/step-registry/redhat-performance/benchmark-runner/02-operators-deploy/redhat-performance-benchmark-runner-02-operators-deploy-commands.sh
  • ci-operator/step-registry/redhat-performance/benchmark-runner/04-grafana/redhat-performance-benchmark-runner-04-grafana-commands.sh

mkdir -p "${WORK_DIR}"
cd "${WORK_DIR}"

git clone --depth 1 -b "${GIT_BRANCH}" https://github.com/redhat-performance/benchmark-runner .
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clone from GIT_REPOSITORY instead of hard-coding the repo URL.

The script validates GIT_REPOSITORY and then ignores it here. If that secret ever points at a fork or different host, this step clones one repository but pulls/pushes another.

Suggested fix
-git clone --depth 1 -b "${GIT_BRANCH}" https://github.com/redhat-performance/benchmark-runner .
+git clone --depth 1 -b "${GIT_BRANCH}" "https://${GIT_REPOSITORY}" .
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
git clone --depth 1 -b "${GIT_BRANCH}" https://github.com/redhat-performance/benchmark-runner .
git clone --depth 1 -b "${GIT_BRANCH}" "https://${GIT_REPOSITORY}" .
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/redhat-performance/benchmark-runner/04-grafana/redhat-performance-benchmark-runner-04-grafana-commands.sh`
at line 80, The git clone command currently hard-codes
"https://github.com/redhat-performance/benchmark-runner" instead of using the
validated GIT_REPOSITORY variable; update the clone invocation in the script
(the git clone line that uses GIT_BRANCH) to clone from "${GIT_REPOSITORY}" with
--depth 1 -b "${GIT_BRANCH}" so the step uses the configured/secret repository
rather than the fixed GitHub URL.

@arpsharm
Copy link
Copy Markdown
Contributor Author

/pj-rehearse pull-ci-redhat-performance-benchmark-runner-main-test-step-sysbench-pod

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@arpsharm: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 26, 2026

@arpsharm: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/redhat-performance/benchmark-runner/main/test-step-sysbench-pod 5f5784f link unknown /pj-rehearse pull-ci-redhat-performance-benchmark-runner-main-test-step-sysbench-pod

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ok-to-test Indicates a non-member PR verified by an org member that is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants