NO-ISSUE: [release-4.21] Stabilize e2e-ocl test suite#5667
NO-ISSUE: [release-4.21] Stabilize e2e-ocl test suite#5667umohnani8 wants to merge 3 commits intoopenshift:release-4.21from
Conversation
Three fixes to address intermittent test failures in CI: 1. TestControllerEventuallyReconciles timeout issue: - Increased job completion timeout from 10 to 20 minutes - Test simulates adverse conditions (scaled down deployments) - Image builds can take longer in resource-constrained CI environments 2. Rate limiter exhaustion in log streaming: - Reduced log streaming retry interval from 2s to 5s - Multiple concurrent goroutines were making API calls too frequently - 60% reduction in API call rate prevents rate limiter exhaustion 3. HTTP/2 connection errors failing tests: - Made log streaming errors non-fatal (log warnings instead) - API server closes long-running log streams when pods terminate - Log collection is for debugging, not a test requirement - Tests now pass/fail based on actual functionality
External image registries (Docker Hub, GitHub Container Registry) have changed their API error responses over time: - Docker.io now returns imageNotFound for nonexistent repos (was accessDenied) - ghcr.io now returns imageNotFound for nonexistent tags (was accessDenied) Updated test to accept either error type when both flags are set: - Modified inspectTestFunc and deleteTestFunc to treat both flags as "accept either" - Updated Docker.io inspect/nonexistentRepo case to accept both error types - Updated GitHub registry delete/nonexistentTag case to accept both error types Both error types are tolerable for ImagePruner functionality, so tests should not be brittle to registry-specific error response changes.
The cleanupEphemeralBuildObjects function was experiencing intermittent timeout failures during cleanup verification, even though deletions were succeeding. Each verification used 5-minute default timeout with 1s poll interval which could exhaust rate limits leading to the "context deadline exceeded" error. Create a dedicated 2-minute timeout context for cleanup verification and increase poll interval from 1s to 3s to reduce API call rate which should be about 40 attempts per resource. Signed-off-by: Urvashi <umohnani@redhat.com>
|
@umohnani8: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: umohnani8 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@umohnani8: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This is a manual backport of #5652, #5613, and #5595 to stabilize the e2e-ocl test suite.
This helps complete https://issues.redhat.com/browse/MCO-2130