Skip to content

OCPBUGS-82010: Retry fetching load balancer information in tests#443

Open
nrb wants to merge 1 commit intoopenshift:mainfrom
nrb:OCPBUGS-82010
Open

OCPBUGS-82010: Retry fetching load balancer information in tests#443
nrb wants to merge 1 commit intoopenshift:mainfrom
nrb:OCPBUGS-82010

Conversation

@nrb
Copy link
Copy Markdown
Contributor

@nrb nrb commented Apr 8, 2026

Provisioning a load balancer is not immediate, and the tests did not retry the lookup.
With RHCOS10, the provisioning is slower, which fails the whole test when the load balancer can't be found.

Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
Assisted-By: Claude Sonnet 4.6
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 8, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@nrb: This pull request references Jira Issue OCPBUGS-82010, which is invalid:

  • expected the bug to target the "4.22.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Provisioning a load balancer is not immediate, and the tests did not retry the lookup.
With RHCOS10, the provisioning is slower, which fails the whole test when the load balancer can't be found.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from chrischdi and damdo April 8, 2026 14:19
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 8, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign racheljpg for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 8, 2026

Walkthrough

The getNLBMetaFromName function in the e2e test file was refactored to add retry logic using Eventually with a 3-minute timeout and 10-second polling interval, replacing a single synchronous DNS lookup call. The retry mechanism handles transient nil results and logs intermediate "not found" conditions.

Changes

Cohort / File(s) Summary
NLB metadata retrieval retry logic
cmd/cloud-controller-manager-aws-tests-ext/e2e/loadbalancer.go
Replaced synchronous getAWSLoadBalancerFromDNSName call with an Eventually retry loop (3-minute timeout, 10-second polling) to handle transient failures and nil results when fetching NLB metadata by DNS name.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@nrb
Copy link
Copy Markdown
Contributor Author

nrb commented Apr 8, 2026

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 8, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@nrb: This pull request references Jira Issue OCPBUGS-82010, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
cmd/cloud-controller-manager-aws-tests-ext/e2e/loadbalancer.go (1)

559-573: Retry logic addresses the provisioning timing issue well.

The Eventually block with 3-minute timeout is appropriate for handling slower load balancer provisioning.

One minor observation: lines 566-569 appear to be unreachable. Based on getAWSLoadBalancerFromDNSName (helper.go:56-58), when the load balancer isn't found, it returns (nil, error) — never (nil, nil). So after the if err != nil check passes, lb is guaranteed to be non-nil.

🔧 Optional: Remove unreachable code
 	var foundLB *elbv2types.LoadBalancer
 	Eventually(ctx, func(ctx context.Context) error {
 		lb, err := getAWSLoadBalancerFromDNSName(ctx, elbc, lbDNS)
 		if err != nil {
 			framework.Logf("Failed to find load balancer with DNS %s: %v", lbDNS, err)
 			return err
 		}
-		if lb == nil {
-			framework.Logf("Load balancer %s not found yet", lbDNS)
-			return fmt.Errorf("load balancer not found yet")
-		}
 		foundLB = lb
 		return nil
 	}).WithTimeout(3*time.Minute).WithPolling(10*time.Second).Should(Succeed(),
 		"failed to find load balancer with DNS name %s", lbDNS)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/cloud-controller-manager-aws-tests-ext/e2e/loadbalancer.go` around lines
559 - 573, The check for lb == nil inside the Eventually loop is unreachable
because getAWSLoadBalancerFromDNSName returns (nil, error) when not found;
remove the redundant if lb == nil block and its associated log/err return,
leaving only the err check and assignment to foundLB after a successful
getAWSLoadBalancerFromDNSName call in the Eventually closure (referencing
foundLB and getAWSLoadBalancerFromDNSName to locate the code inside the
Eventually(...) block).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@cmd/cloud-controller-manager-aws-tests-ext/e2e/loadbalancer.go`:
- Around line 559-573: The check for lb == nil inside the Eventually loop is
unreachable because getAWSLoadBalancerFromDNSName returns (nil, error) when not
found; remove the redundant if lb == nil block and its associated log/err
return, leaving only the err check and assignment to foundLB after a successful
getAWSLoadBalancerFromDNSName call in the Eventually closure (referencing
foundLB and getAWSLoadBalancerFromDNSName to locate the code inside the
Eventually(...) block).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d2f66772-4230-4432-b350-71819eb43368

📥 Commits

Reviewing files that changed from the base of the PR and between 4f5632a and b529237.

📒 Files selected for processing (1)
  • cmd/cloud-controller-manager-aws-tests-ext/e2e/loadbalancer.go

@nrb
Copy link
Copy Markdown
Contributor Author

nrb commented Apr 8, 2026

/test e2e-aws-ovn

@mtulio
Copy link
Copy Markdown
Contributor

mtulio commented Apr 8, 2026

/assign

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 8, 2026

@nrb: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@mtulio
Copy link
Copy Markdown
Contributor

mtulio commented Apr 9, 2026

/payload-job periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-rhcos10-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 9, 2026

@mtulio: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-rhcos10-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7166a4a0-33b7-11f1-87aa-b966f53712a3-0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants