Skip to content

Claude harness: avoid invalid --continue retries after SIGTERM/no-deferred-marker#31194

Merged
pelikhan merged 6 commits into
mainfrom
copilot/aw-failures-fix-continue-error
May 9, 2026
Merged

Claude harness: avoid invalid --continue retries after SIGTERM/no-deferred-marker#31194
pelikhan merged 6 commits into
mainfrom
copilot/aw-failures-fix-continue-error

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 9, 2026

Bug Fix

What was the bug?

claude_harness.cjs treated failed attempts with output as resumable via --continue. After SIGTERM/SIGKILL-style termination (143/137) or No deferred tool marker found, retry flow could enter an invalid continue path and fail prematurely. It could also accidentally keep a caller-supplied --continue on “fresh” retries.

How did you fix it?

  • Retry path hardening

    • Added isSignalTerminationExitCode() for 137/143.
    • Added shouldRetryWithContinue() to centralize continue eligibility checks.
    • Continue retries are blocked for signal-style exits and no-deferred-marker conditions.
  • Fresh-retry fallback instead of terminal failure

    • On isNoDeferredMarkerError, the harness now falls through to a fresh retry (when budget remains) instead of stopping immediately.
    • --continue is permanently disabled after invalid continue-path detection.
    • Fresh-retry args now explicitly strip user-supplied --continue so non-continue retries are truly fresh.
  • Failure reason clarity

    • Added explicit diagnostic fingerprints in logs:
      • failure_reason=harness_retry_path_invalid
      • failure_reason=cancelled_or_timed_out
  • Unit coverage for retry matrix

    • Added/updated tests for:
      • exitCode=143shouldRetryWithContinue(...) === false
      • isNoDeferredMarker=trueshouldRetryWithContinue(...) === false
      • normal partial-execution case remains continue-eligible
      • user-supplied --continue is removed on fresh retry after invalid continue-path detection
function shouldRetryWithContinue({ exitCode, isNoDeferredMarker, ...rest }) {
  if (isSignalTerminationExitCode(exitCode)) return false; // 137/143
  if (isNoDeferredMarker) return false;
  // ...other guards...
  return true;
}

Testing

Targeted harness unit tests were updated and run to validate the retry-decision matrix, fresh-retry fallback behavior, and user-supplied --continue stripping on fresh retries.

Copilot AI and others added 3 commits May 9, 2026 11:47
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix continue error after SIGTERM in Claude harness Claude harness: avoid invalid --continue retries after SIGTERM/no-deferred-marker May 9, 2026
Copilot AI requested a review from pelikhan May 9, 2026 11:50
@pelikhan pelikhan marked this pull request as ready for review May 9, 2026 11:57
Copilot AI review requested due to automatic review settings May 9, 2026 11:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Claude harness retry behavior to avoid invalid --continue retries after signal-style termination and after No deferred tool marker found, falling back to fresh retries when appropriate.

Changes:

  • Added isSignalTerminationExitCode() and shouldRetryWithContinue() to centralize/strengthen --continue retry eligibility.
  • Changed no-deferred-marker handling to disable --continue permanently and fall back to fresh retries when budget remains.
  • Added unit tests covering the new retry-decision matrix helpers.
Show a summary per file
File Description
actions/setup/js/claude_harness.cjs Adds new helper functions and hardens retry logic to avoid invalid --continue paths and improve diagnostics.
actions/setup/js/claude_harness.test.cjs Adds unit tests for the new helper functions and retry eligibility conditions.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 2

Comment thread actions/setup/js/claude_harness.cjs Outdated
@@ -79,6 +79,7 @@ const MAX_TURNS_EXIT_PATTERN = /"subtype"\s*:\s*"error_max_turns"/;
// window. Retrying with --continue will always produce the same instant failure, so
// this is a deterministic terminal condition that must not be retried.
log(`attempt ${attempt + 1}: no deferred tool marker — not retriable via --continue`);
if (attempt < MAX_RETRIES && result.hasOutput) {
useContinueOnRetry = false;
continueDisabledPermanently = true;
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

Necromancer fortified this PR with fresh regression coverage.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

🧪 Test Quality Sentinel Report

Test Quality Score: 88/100

Excellent test quality

Metric Value
New/modified tests analyzed 5
✅ Design tests (behavioral contracts) 5 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 3 (60%)
Duplicate test clusters 0
Test inflation detected No (ratio: 0.73)
🚨 Coding-guideline violations 0

Test Classification Details

View all 5 tests
Test File Classification Issues Detected
isSignalTerminationExitCode – returns true for SIGKILL/SIGTERM-style exit codes claude_harness.test.cjs ✅ Design None
isSignalTerminationExitCode – returns false for non-signal exit codes claude_harness.test.cjs ✅ Design Boundary values (1, 2)
shouldRetryWithContinue – does not use --continue for signal-style termination exitCode=143 claude_harness.test.cjs ✅ Design Covers the core bug fix
shouldRetryWithContinue – returns true for normal partial-execution retry claude_harness.test.cjs ✅ Design Happy path
shouldRetryWithContinue – returns false when no-deferred-marker error is present claude_harness.test.cjs ✅ Design Edge/error case

Flagged Tests — Requires Review

No tests flagged. All tests verify observable behavioral contracts.


Score Breakdown

Component Score Notes
Behavioral Coverage (40 pts) 40/40 5/5 tests verify observable outputs
Error/Edge Case Coverage (30 pts) 18/30 3/5 tests include edge/boundary cases
Low Duplication (20 pts) 20/20 No duplicate clusters
Proportional Growth (10 pts) 10/10 Test/production ratio 0.73 (below 2:1 threshold)
Total 88/100

Language Support

Tests analyzed:

  • 🟨 JavaScript (*.test.cjs): 5 tests (vitest) — no vi.* mocking; all tests exercise functions directly with plain object inputs

Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). The new tests directly verify the behavioral contracts introduced by this PR — specifically that shouldRetryWithContinue returns false for SIGTERM/SIGKILL exit codes (143, 137) and for the no-deferred-marker condition, which are the exact scenarios the fix targets.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.


References: §25600541860

🧪 Test quality analysis by Test Quality Sentinel · ● 5.9M ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 88/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 5 new tests verify behavioral contracts directly relevant to the bug fix.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /diagnose and /tdd — this is a targeted bug fix addressing invalid --continue retry paths after signal-style termination.

Key Themes

  • Dead parameter in shouldRetryWithContinue: The isNoDeferredMarker parameter (and its guard inside the function) is unreachable at the only call site in main, because the isNoDeferredMarker early-exit block always continues or breaks before reaching the retry section. This makes the parameter misleading and its test case moot — see inline comment.

  • Test coverage gaps on shouldRetryWithContinue's early-guard compound condition: The first if (attempt >= maxRetries || !hasOutput || continueDisabledPermanently) has three sub-conditions, none of which are individually tested — see inline comment.

Positive Highlights

  • ✅ Extracting shouldRetryWithContinue into a pure function is exactly the right move — it's easily testable in isolation and makes the retry logic readable.
  • continueDisabledPermanently as a sticky flag prevents re-entering a broken continue-path on subsequent attempts — clean state machine design.
  • ✅ Explicit failure_reason= log tokens are a nice addition for post-hoc diagnostics and log-based alerting.
  • SIGNAL_TERMINATION_EXIT_CODES as a Set is efficient and self-documenting.

Verdict

Commentary only — the core fix is sound and well-structured. The two issues flagged are a correctness concern (dead parameter that implies a reachable code path that doesn't exist) and a test-completeness concern. Neither blocks the fix, but addressing them would make the retry logic more trustworthy long-term.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 4.7M

* }} input
* @returns {boolean}
*/
function shouldRetryWithContinue({ attempt, maxRetries, exitCode, hasOutput, isNoDeferredMarker, continueDisabledPermanently }) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] The isNoDeferredMarker guard inside shouldRetryWithContinue is dead code at the call site.

The early-exit block in main always either continues or breaks when isNoDeferredMarker is true, so by the time shouldRetryWithContinue is invoked, isNoDeferredMarker is invariably false. The test exercising isNoDeferredMarker: true therefore covers a path that cannot occur in production.

Consider removing the parameter and its guard from shouldRetryWithContinue, so the function contract matches its actual call sites.

hasOutput: true,
isNoDeferredMarker: true,
continueDisabledPermanently: false,
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The three guards in shouldRetryWithContinue that fire before the signal-termination check are untested:

  • attempt >= maxRetries → should return false
  • !hasOutput → should return false
  • continueDisabledPermanently: true → should return false

Suggested additions:

it('returns false when attempt has reached maxRetries', () => {
  expect(shouldRetryWithContinue({ attempt: 3, maxRetries: 3, exitCode: 1, hasOutput: true, isNoDeferredMarker: false, continueDisabledPermanently: false })).toBe(false);
});
it('returns false when hasOutput is false', () => {
  expect(shouldRetryWithContinue({ attempt: 0, maxRetries: 3, exitCode: 1, hasOutput: false, isNoDeferredMarker: false, continueDisabledPermanently: false })).toBe(false);
});
it('returns false when continueDisabledPermanently is true', () => {
  expect(shouldRetryWithContinue({ attempt: 0, maxRetries: 3, exitCode: 1, hasOutput: true, isNoDeferredMarker: false, continueDisabledPermanently: true })).toBe(false);
});

Without these, a future refactor of the early-guard line could silently regress retry behaviour.

@pelikhan
Copy link
Copy Markdown
Collaborator

pelikhan commented May 9, 2026

@copilot review all comments

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

Commit pushed: 92c1f55

🧟 Regression revived by Necromancer

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

Necromancer regression pass complete.

Linked issue investigated: #31177, covering Claude harness retries that incorrectly used --continue after SIGTERM-style exits and then failed with No deferred tool marker found.

Regression risk found: the PR had helper-level coverage, but the retry loop itself could still regress by re-entering --continue or stopping after a no-deferred-marker failure.

Tests added: updated actions/setup/js/claude_harness.test.cjs with subprocess-based harness regression tests for both paths: normal partial failure -> --continue -> no-deferred-marker -> fresh retry, and SIGTERM-style 143 -> fresh retry without --continue. Also expanded the retry matrix to cover both 137 and 143.

Validation run:

  • make build && make fmt passed after the first edit.
  • node --input-type=module --check < actions/setup/js/claude_harness.test.cjs passed.
  • git diff --check passed.
  • Manual harness scenarios passed for both no-deferred-marker fallback and signal-style fresh retry.
  • npm run test:js -- claude_harness.test.cjs could not run because vitest was not installed. Attempts to run npm ci failed with npm internal Exit handler never called, and npx vitest@4.1.5 failed fetching from npm with ERR_SSL_PACKET_LENGTH_TOO_LONG.
  • make agent-report-progress failed at fmt-cjs for the same missing JS dependency state (npm run format:cjs exit 127).

Changes pushed to PR branch: f6ac99c Add Claude harness retry regression tests.

🧟 Regression revived by Necromancer ·

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 9, 2026

@copilot review all comments

Addressed the review feedback in commit 05bff43.

  • Updated the no-deferred-marker comments to match current behavior (block --continue, allow fresh-run fallback).
  • Fixed fresh retry arg handling so user-supplied --continue is stripped on non-continue retries, preventing retry loops.
  • Added regression coverage for the user-supplied --continue scenario.

Validated with npx vitest run claude_harness.test.cjs and npm run lint:cjs in actions/setup/js.

@pelikhan pelikhan merged commit aded0f8 into main May 9, 2026
@pelikhan pelikhan deleted the copilot/aw-failures-fix-continue-error branch May 9, 2026 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[aw-failures] Claude harness --continue after SIGTERM(143) emits 'No deferred tool marker' instead of starting a fresh attempt

3 participants