Skip to content

USHIFT-6799: C2CC: Latency measurement between clusters#6794

Open
pmtk wants to merge 5 commits into
openshift:mainfrom
pmtk:c2cc/latency
Open

USHIFT-6799: C2CC: Latency measurement between clusters#6794
pmtk wants to merge 5 commits into
openshift:mainfrom
pmtk:c2cc/latency

Conversation

@pmtk
Copy link
Copy Markdown
Member

@pmtk pmtk commented Jun 3, 2026

Summary by CodeRabbit

  • New Features

    • RemoteCluster status now exposes latency statistics: avg, min, max, last, stddev (rolling-window).
  • Improvements

    • Probe loop measures RTT and updates status with latency stats; failed probes preserve prior latency values.
  • Tests

    • Added unit and end-to-end tests validating latency collection, statistical calculations, wrap-around behavior, and status reporting.

pmtk added 3 commits June 3, 2026 09:33
The probe pod will populate latency statistics from a rolling window
of probe samples. Using metav1.Duration fields (serialized as Go
duration strings like "1.234ms") avoids controller-gen's restriction
on float types while keeping values human-readable.
Circular buffer of 20 samples that computes avg, min, max, last, and
stddev as time.Duration values.
Each successful HTTP probe now measures round-trip time. The RTT is
added to a per-remote-cluster latencyWindow (circular buffer of 20
samples), and the computed stats are written to status.latency on
every status update. The window resets when a probe is restarted
(e.g. on spec change). Latency stats are preserved across failed
probes to avoid losing data during transient failures.
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 3, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 3, 2026
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 3, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Jun 3, 2026

@pmtk: This pull request references USHIFT-6799 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 3, 2026

Walkthrough

This PR adds rolling-window latency collection and stats for RemoteCluster probes: a ring buffer computes avg/min/max/last/stddev, probe manager records RTTs and sets status.Latency, CRD and Go types expose the stats, and Robot tests verify end-to-end reporting.

Changes

Latency Tracking Feature

Layer / File(s) Summary
Data model and CRD
pkg/apis/microshift/v1alpha1/types.go, assets/crd/microshift.io_remoteclusters.yaml
RemoteClusterStatus gains optional Latency field and LatencyStats type; CRD status schema added with required avg, last, max, min, stddev string fields.
Latency window ring buffer & unit tests
pkg/controllers/c2cc/latency.go, pkg/controllers/c2cc/latency_test.go
Fixed-size circular buffer collects RTT samples with add() and computes avg/min/max/last/stddev via stats(); comprehensive unit tests for empty, single, partial, wrap-around, deterministic, and identical-sample cases.
Probe manager latency integration
pkg/controllers/c2cc/probe.go
ProbeManager gains per-cluster latency windows; doProbe returns RTT; probe loop records RTTs, sets status.Latency from window stats, and preserves previous latency on failed probes.
Robot framework integration tests
test/suites/c2cc/probe.robot
Tests wait for latency stats and assert avg/min/max/last/stddev are populated for remote clusters; helper keyword queries status.latency fields via jsonpath.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • openshift/microshift#6727: Introduces the initial RemoteCluster CRD and Go types; this PR extends the same resources with latency-specific schema and controller wiring.

Suggested labels

ready-for-human-review

Suggested reviewers

  • jogeo
  • kasturinarra

Caution

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

  • Ignore

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
No-Sensitive-Data-In-Logs ❌ Error The logging statement in probe.go logs rc.Spec.ProbeTarget (an internal IP:port address) which is flagged as sensitive internal hostname data. Remove or redact ProbeTarget from the "Starting probe for" log statement, or change it to Info level only if necessary.
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (13 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding latency measurement for cluster-to-cluster probes, which aligns perfectly with all the file changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR contains no Ginkgo tests. Go unit tests and Robot Framework tests use static, deterministic names without dynamic content.
Test Structure And Quality ✅ Passed No Ginkgo tests present in PR. Tests use Go testing package with testify and Robot Framework instead; check is not applicable.
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests added. PR contains Go unit tests (latency_test.go using testify) and Robot Framework tests (probe.robot). Check is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests added—only Robot Framework tests and standard Go unit tests are present, which fall outside the scope of the SNO compatibility check.
Topology-Aware Scheduling Compatibility ✅ Passed PR adds latency measurement to C2CC probes via CRD schema, types, ring buffer logic, and tests. No deployment manifests, pod definitions, or scheduling constraints present.
Ote Binary Stdout Contract ✅ Passed PR adds no direct stdout writes and uses klog (writes to stderr by default). No process-level code, init functions, or top-level initializers with side effects violate OTE stdout contract.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e tests added. PR includes unit tests (testify-based) and Robot Framework tests, neither of which fall under the Ginkgo check scope.
No-Weak-Crypto ✅ Passed PR implements latency measurement (network RTT statistics) with no weak crypto (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto, or non-constant-time comparisons detected.
Container-Privileges ✅ Passed PR adds latency measurement for C2CC probes via CRD schema updates, Go types/logic, and tests. No container security settings, privileged modes, or capability escalations introduced.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 3, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pmtk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 3, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/suites/c2cc/probe.robot`:
- Around line 88-101: The test "Latency Stats Are Reasonable" currently only
checks presence via Should Not Be Empty for fields retrieved by Get Latency
Field; either rename the test to something like "Latency Stats Present" to
reflect presence-only checks, or add numeric/bounds validation: for each ${avg},
${min}, ${max}, ${last}, ${stddev} convert to numbers (e.g., using Convert To
Number or your helper) and assert ${min} >= 0, ${stddev} >= 0, ${min} <= ${avg}
<= ${max}, ${last} within [${min}, ${max}], and optionally enforce an upper
bound on ${max} (e.g., <200 ms) to reflect "reasonable" for local-network
probes; update the test body accordingly where Get Latency Field and Should Not
Be Empty are used.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 282796d3-2f96-4f81-b68f-0c9a5e284a70

📥 Commits

Reviewing files that changed from the base of the PR and between 8d4bf57 and 1c6513e.

⛔ Files ignored due to path filters (1)
  • pkg/apis/microshift/v1alpha1/zz_generated.deepcopy.go is excluded by !**/zz_generated*
📒 Files selected for processing (6)
  • assets/crd/microshift.io_remoteclusters.yaml
  • pkg/apis/microshift/v1alpha1/types.go
  • pkg/controllers/c2cc/latency.go
  • pkg/controllers/c2cc/latency_test.go
  • pkg/controllers/c2cc/probe.go
  • test/suites/c2cc/probe.robot

Comment thread test/suites/c2cc/probe.robot Outdated
@pmtk pmtk changed the title USHIFT-6799: C2CC: Latency between clusters USHIFT-6799: C2CC: Latency measurement between clusters Jun 3, 2026
Verifies that RemoteCluster CRs are populated with latency stats
(avg, min, max, last, stddev) after probes have run, and that all
fields contain non-empty duration values.
@pmtk
Copy link
Copy Markdown
Member Author

pmtk commented Jun 3, 2026

/test e2e-aws-tests-bootc-arm-el9 e2e-aws-tests-bootc-el10

@pmtk pmtk marked this pull request as ready for review June 3, 2026 12:04
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 3, 2026
@openshift-ci openshift-ci Bot requested review from agullon and ggiguash June 3, 2026 12:04
agullon

This comment was marked as outdated.

Comment thread test/suites/c2cc/probe.robot
Comment thread test/suites/c2cc/probe.robot Outdated
Comment thread test/suites/c2cc/probe.robot Outdated
@coderabbitai coderabbitai Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label Jun 3, 2026
@pmtk
Copy link
Copy Markdown
Member Author

pmtk commented Jun 3, 2026

/test e2e-aws-tests

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 3, 2026

@pmtk: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@agullon
Copy link
Copy Markdown
Contributor

agullon commented Jun 3, 2026

it looks to me from QE POV

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants