-
Notifications
You must be signed in to change notification settings - Fork 54
fix: Prevent CI status getting stuck in 'pending' state (race condition) #952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #952 +/- ##
==========================================
- Coverage 86.88% 86.81% -0.08%
==========================================
Files 35 35
Lines 3759 3822 +63
Branches 767 776 +9
==========================================
+ Hits 3266 3318 +52
- Misses 355 364 +9
- Partials 138 140 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
0a0845b to
6152875
Compare
This fix addresses a critical race condition that caused GitHub CI status to remain stuck as "pending" / "Tests queued" even after tests failed. | Time | Event | GitHub Status | |------|-------|---------------| | 10:29:32 | Workflow requested, tests scheduled | pending - "Waiting for actions" | | 10:30:54 | Linux test fails (artifact not ready) | error - "No build artifact found" | | 10:31:24 | Linux artifact created | — | | 10:31:43 | Windows test fails (artifact not ready) | error - "No build artifact found" | | 11:08:23 | Windows artifact created | — | | 11:09:22 | Workflow completes, NEW tests queued | pending - "Tests queued" ← STUCK | 1. **Race Condition**: Tests were queued before artifacts were available. The workflow_run.completed webhook fired, but GitHub's artifact API hadn't propagated the artifacts yet (29 seconds gap for Linux). 2. **Status Override**: When new tests (7138/7139) were queued at 11:09:22, the status was set to "pending", overwriting the previous "error" status. 3. **Silent Failure**: The `mark_test_failed` function caught all exceptions silently, so if GitHub status update failed, no error was reported and the status remained stuck as "pending". - Separated database and GitHub updates so one failing doesn't block the other - Added detailed logging for each step (db_success, github_success) - Now includes target_url in GitHub status for better traceability - Logs CRITICAL error when both updates fail - Added `verify_artifacts_exist()` check before calling `queue_test()` - If workflow succeeded but artifact isn't found, posts ERROR status: "Build succeeded but artifact not yet available - please retry" - Prevents queuing tests that will immediately fail - New `find_artifact_for_commit()` function with proper pagination - Searches up to MAX_ARTIFACTS_TO_SEARCH (500) artifacts - Better error handling and logging - Used by both webhook handler and start_test() - All artifact search operations now log progress - Test failures include test ID in log messages - Artifact availability is logged before queuing - `mod_ci/controllers.py`: Core fixes - `tests/test_ci/test_controllers.py`: Added 12 new tests - TestFindArtifactForCommit: 5 tests - TestVerifyArtifactsExist: 3 tests - TestMarkTestFailedImproved: 4 tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…rror Works around different PyGithub versions having different type stubs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tests that exercise the successful workflow completion path now mock
verify_artifacts_exist to return {'linux': True, 'windows': True}.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
6152875 to
c8d796b
Compare
|



Summary
This PR fixes a critical race condition that caused GitHub CI status to remain permanently stuck as "pending" / "Tests queued" even after tests had actually failed. This issue was discovered while investigating CCExtractor PR #1880.
Key fixes:
Bug Analysis
Real-World Case: CCExtractor PR #1880
The CI status for PR #1880 showed "Waiting for status to be reported — Tests queued" for both Linux and Windows, despite the tests having actually failed hours earlier.
Timeline of Events (2025-12-23, all times UTC)
pending- "Waiting for actions to complete"error- "No build artifact found"error- "No build artifact found"pending- "Tests queued" ← STUCK HERERoot Causes Identified
1. Race Condition (Primary Issue)
When the
workflow_run.completedwebhook fires, the Sample Platform immediately queues tests. However, GitHub's artifact API may not have propagated the artifacts yet. In this case:2. Status Override
When new tests were queued at 11:09:22, the GitHub status was set to "pending" with "Tests queued", overwriting the previous "error" status. GitHub's commit status API replaces the previous status for each context.
3. Silent Failure in
mark_test_failedThe function caught all exceptions and only logged them:
Solution
1. Artifact Verification Before Queuing Tests
Added
verify_artifacts_exist()function that checks if artifacts are available before callingqueue_test(). If the workflow succeeded but artifacts aren't found (race condition), it posts an ERROR status:This prevents queuing tests that will immediately fail.
2. Robust
mark_test_failedFunctionCompletely rewrote to ensure GitHub is always notified:
target_urlfor better traceability3. Improved Artifact Search
New
find_artifact_for_commit()function:MAX_ARTIFACTS_TO_SEARCH)start_test()Files Changed
mod_ci/controllers.pytests/test_ci/test_controllers.pyNew Tests
TestFindArtifactForCommit(5 tests): Success, not found, wrong platform, exception handling, max search limitTestVerifyArtifactsExist(3 tests): Both exist, only Linux, neither existsTestMarkTestFailedImproved(4 tests): Both succeed, DB fails, both fail, includes target_urlTest Plan
Related
🤖 Generated with Claude Code