Claude harness: avoid invalid `--continue` retries after SIGTERM/no-deferred-marker by Copilot · Pull Request #31194 · github/gh-aw

Copilot · 2026-05-09T11:38:34Z

Bug Fix

What was the bug?

claude_harness.cjs treated failed attempts with output as resumable via --continue. After SIGTERM/SIGKILL-style termination (143/137) or No deferred tool marker found, retry flow could enter an invalid continue path and fail prematurely. It could also accidentally keep a caller-supplied --continue on “fresh” retries.

How did you fix it?

Retry path hardening
- Added isSignalTerminationExitCode() for 137/143.
- Added shouldRetryWithContinue() to centralize continue eligibility checks.
- Continue retries are blocked for signal-style exits and no-deferred-marker conditions.
Fresh-retry fallback instead of terminal failure
- On isNoDeferredMarkerError, the harness now falls through to a fresh retry (when budget remains) instead of stopping immediately.
- --continue is permanently disabled after invalid continue-path detection.
- Fresh-retry args now explicitly strip user-supplied --continue so non-continue retries are truly fresh.
Failure reason clarity
- Added explicit diagnostic fingerprints in logs:
  - failure_reason=harness_retry_path_invalid
  - failure_reason=cancelled_or_timed_out
Unit coverage for retry matrix
- Added/updated tests for:
  - exitCode=143 ⇒ shouldRetryWithContinue(...) === false
  - isNoDeferredMarker=true ⇒ shouldRetryWithContinue(...) === false
  - normal partial-execution case remains continue-eligible
  - user-supplied --continue is removed on fresh retry after invalid continue-path detection

function shouldRetryWithContinue({ exitCode, isNoDeferredMarker, ...rest }) {
  if (isSignalTerminationExitCode(exitCode)) return false; // 137/143
  if (isNoDeferredMarker) return false;
  // ...other guards...
  return true;
}

Testing

Targeted harness unit tests were updated and run to validate the retry-decision matrix, fresh-retry fallback behavior, and user-supplied --continue stripping on fresh retries.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot

Pull request overview

Fixes Claude harness retry behavior to avoid invalid --continue retries after signal-style termination and after No deferred tool marker found, falling back to fresh retries when appropriate.

Changes:

Added isSignalTerminationExitCode() and shouldRetryWithContinue() to centralize/strengthen --continue retry eligibility.
Changed no-deferred-marker handling to disable --continue permanently and fall back to fresh retries when budget remains.
Added unit tests covering the new retry-decision matrix helpers.

Show a summary per file

File	Description
actions/setup/js/claude_harness.cjs	Adds new helper functions and hardens retry logic to avoid invalid `--continue` paths and improve diagnostics.
actions/setup/js/claude_harness.test.cjs	Adds unit tests for the new helper functions and retry eligibility conditions.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 2/2 changed files
Comments generated: 2

@@ -79,6 +79,7 @@ const MAX_TURNS_EXIT_PATTERN = /"subtype"\s*:\s*"error_max_turns"/;
 // window.  Retrying with --continue will always produce the same instant failure, so
 // this is a deterministic terminal condition that must not be retried.


-      log(`attempt ${attempt + 1}: no deferred tool marker — not retriable via --continue`);
+      if (attempt < MAX_RETRIES && result.hasOutput) {
+        useContinueOnRetry = false;
+        continueDisabledPermanently = true;


github-actions · 2026-05-09T12:06:33Z

✅ Necromancer fortified this PR with fresh regression coverage.

github-actions · 2026-05-09T12:11:34Z

🧪 Test Quality Sentinel Report

Test Quality Score: 88/100

✅ Excellent test quality

Metric	Value
New/modified tests analyzed	5
✅ Design tests (behavioral contracts)	5 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Tests with error/edge cases	3 (60%)
Duplicate test clusters	0
Test inflation detected	No (ratio: 0.73)
🚨 Coding-guideline violations	0

Test Classification Details

View all 5 tests

Test	File	Classification	Issues Detected
`isSignalTerminationExitCode` – returns true for SIGKILL/SIGTERM-style exit codes	`claude_harness.test.cjs`	✅ Design	None
`isSignalTerminationExitCode` – returns false for non-signal exit codes	`claude_harness.test.cjs`	✅ Design	Boundary values (1, 2)
`shouldRetryWithContinue` – does not use --continue for signal-style termination exitCode=143	`claude_harness.test.cjs`	✅ Design	Covers the core bug fix
`shouldRetryWithContinue` – returns true for normal partial-execution retry	`claude_harness.test.cjs`	✅ Design	Happy path
`shouldRetryWithContinue` – returns false when no-deferred-marker error is present	`claude_harness.test.cjs`	✅ Design	Edge/error case

Flagged Tests — Requires Review

No tests flagged. All tests verify observable behavioral contracts.

Score Breakdown

Component	Score	Notes
Behavioral Coverage (40 pts)	40/40	5/5 tests verify observable outputs
Error/Edge Case Coverage (30 pts)	18/30	3/5 tests include edge/boundary cases
Low Duplication (20 pts)	20/20	No duplicate clusters
Proportional Growth (10 pts)	10/10	Test/production ratio 0.73 (below 2:1 threshold)
Total	88/100

Language Support

Tests analyzed:

🟨 JavaScript (*.test.cjs): 5 tests (vitest) — no vi.* mocking; all tests exercise functions directly with plain object inputs

Verdict

✅ Check passed. 0% of new tests are implementation tests (threshold: 30%). The new tests directly verify the behavioral contracts introduced by this PR — specifically that shouldRetryWithContinue returns false for SIGTERM/SIGKILL exit codes (143, 137) and for the no-deferred-marker condition, which are the exact scenarios the fix targets.

📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

Assert on observable outputs, return values, or state changes
Cover error paths and boundary conditions
Would catch a behavioral regression if deleted
Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

Assert on internal function calls (mocking internals)
Only test the happy path with typical inputs
Break during legitimate refactoring even when behavior is correct
Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

References: §25600541860

🧪 Test quality analysis by Test Quality Sentinel · ● 5.9M · ◷

github-actions

✅ Test Quality Sentinel: 88/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 5 new tests verify behavioral contracts directly relevant to the bug fix.

github-actions

Skills-Based Review 🧠

Applied /diagnose and /tdd — this is a targeted bug fix addressing invalid --continue retry paths after signal-style termination.

Key Themes

Dead parameter in shouldRetryWithContinue: The isNoDeferredMarker parameter (and its guard inside the function) is unreachable at the only call site in main, because the isNoDeferredMarker early-exit block always continues or breaks before reaching the retry section. This makes the parameter misleading and its test case moot — see inline comment.
Test coverage gaps on shouldRetryWithContinue's early-guard compound condition: The first if (attempt >= maxRetries || !hasOutput || continueDisabledPermanently) has three sub-conditions, none of which are individually tested — see inline comment.

Positive Highlights

✅ Extracting shouldRetryWithContinue into a pure function is exactly the right move — it's easily testable in isolation and makes the retry logic readable.
✅ continueDisabledPermanently as a sticky flag prevents re-entering a broken continue-path on subsequent attempts — clean state machine design.
✅ Explicit failure_reason= log tokens are a nice addition for post-hoc diagnostics and log-based alerting.
✅ SIGNAL_TERMINATION_EXIT_CODES as a Set is efficient and self-documenting.

Verdict

Commentary only — the core fix is sound and well-structured. The two issues flagged are a correctness concern (dead parameter that implies a reachable code path that doesn't exist) and a test-completeness concern. Neither blocks the fix, but addressing them would make the retry logic more trustworthy long-term.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 4.7M

github-actions · 2026-05-09T12:13:35Z

+ * }} input
+ * @returns {boolean}
+ */
+function shouldRetryWithContinue({ attempt, maxRetries, exitCode, hasOutput, isNoDeferredMarker, continueDisabledPermanently }) {


[/diagnose] The isNoDeferredMarker guard inside shouldRetryWithContinue is dead code at the call site.

The early-exit block in main always either continues or breaks when isNoDeferredMarker is true, so by the time shouldRetryWithContinue is invoked, isNoDeferredMarker is invariably false. The test exercising isNoDeferredMarker: true therefore covers a path that cannot occur in production.

Consider removing the parameter and its guard from shouldRetryWithContinue, so the function contract matches its actual call sites.

github-actions · 2026-05-09T12:13:35Z

+        hasOutput: true,
+        isNoDeferredMarker: true,
+        continueDisabledPermanently: false,
+      });


[/tdd] The three guards in shouldRetryWithContinue that fire before the signal-termination check are untested:

attempt >= maxRetries → should return false

!hasOutput → should return false

continueDisabledPermanently: true → should return false

Suggested additions:

it('returns false when attempt has reached maxRetries', () => { expect(shouldRetryWithContinue({ attempt: 3, maxRetries: 3, exitCode: 1, hasOutput: true, isNoDeferredMarker: false, continueDisabledPermanently: false })).toBe(false); }); it('returns false when hasOutput is false', () => { expect(shouldRetryWithContinue({ attempt: 0, maxRetries: 3, exitCode: 1, hasOutput: false, isNoDeferredMarker: false, continueDisabledPermanently: false })).toBe(false); }); it('returns false when continueDisabledPermanently is true', () => { expect(shouldRetryWithContinue({ attempt: 0, maxRetries: 3, exitCode: 1, hasOutput: true, isNoDeferredMarker: false, continueDisabledPermanently: true })).toBe(false); });

Without these, a future refactor of the early-guard line could silently regress retry behaviour.

pelikhan · 2026-05-09T12:16:07Z

@copilot review all comments

github-actions · 2026-05-09T12:19:59Z

Commit pushed: 92c1f55

🧟 Regression revived by Necromancer

github-actions · 2026-05-09T12:20:00Z

Necromancer regression pass complete.

Linked issue investigated: #31177, covering Claude harness retries that incorrectly used --continue after SIGTERM-style exits and then failed with No deferred tool marker found.

Regression risk found: the PR had helper-level coverage, but the retry loop itself could still regress by re-entering --continue or stopping after a no-deferred-marker failure.

Tests added: updated actions/setup/js/claude_harness.test.cjs with subprocess-based harness regression tests for both paths: normal partial failure -> --continue -> no-deferred-marker -> fresh retry, and SIGTERM-style 143 -> fresh retry without --continue. Also expanded the retry matrix to cover both 137 and 143.

Validation run:

make build && make fmt passed after the first edit.
node --input-type=module --check < actions/setup/js/claude_harness.test.cjs passed.
git diff --check passed.
Manual harness scenarios passed for both no-deferred-marker fallback and signal-style fresh retry.
npm run test:js -- claude_harness.test.cjs could not run because vitest was not installed. Attempts to run npm ci failed with npm internal Exit handler never called, and npx vitest@4.1.5 failed fetching from npm with ERR_SSL_PACKET_LENGTH_TOO_LONG.
make agent-report-progress failed at fmt-cjs for the same missing JS dependency state (npm run format:cjs exit 127).

Changes pushed to PR branch: f6ac99c Add Claude harness retry regression tests.

🧟 Regression revived by Necromancer · ◷

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-05-09T12:27:32Z

@copilot review all comments

Addressed the review feedback in commit 05bff43.

Updated the no-deferred-marker comments to match current behavior (block --continue, allow fresh-run fallback).
Fixed fresh retry arg handling so user-supplied --continue is stripped on non-continue retries, preventing retry loops.
Added regression coverage for the user-supplied --continue scenario.

Validated with npx vitest run claude_harness.test.cjs and npm run lint:cjs in actions/setup/js.

Initial plan

20ce33d

Copilot AI assigned Copilot and pelikhan May 9, 2026

Copilot started work on behalf of pelikhan May 9, 2026 11:38 View session

Copilot AI linked an issue May 9, 2026 that may be closed by this pull request

[aw-failures] Claude harness --continue after SIGTERM(143) emits 'No deferred tool marker' instead of starting a fresh attempt #31177

Closed

Copilot AI and others added 3 commits May 9, 2026 11:47

Fix claude harness retry path for SIGTERM and no-marker resumes

c434c67

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Refine claude retry logs and tests from review feedback

2d3ebc1

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Tighten claude retry decision flow and add no-marker test

9e7b753

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix continue error after SIGTERM in Claude harness~~ Claude harness: avoid invalid --continue retries after SIGTERM/no-deferred-marker May 9, 2026

Copilot finished work on behalf of pelikhan May 9, 2026 11:50

Copilot AI requested a review from pelikhan May 9, 2026 11:50

pelikhan marked this pull request as ready for review May 9, 2026 11:57

Copilot AI review requested due to automatic review settings May 9, 2026 11:57

Copilot started reviewing on behalf of pelikhan May 9, 2026 11:57 View session

Copilot AI reviewed May 9, 2026

View reviewed changes

pelikhan added the necromancer label May 9, 2026

github-actions Bot removed the necromancer label May 9, 2026

github-actions Bot approved these changes May 9, 2026

View reviewed changes

github-actions Bot reviewed May 9, 2026

View reviewed changes

Add Claude harness retry regression tests

92c1f55

Copilot started work on behalf of pelikhan May 9, 2026 12:20 View session

Fix fresh-retry args after invalid --continue path

05bff43

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan May 9, 2026 12:28

pelikhan merged commit aded0f8 into main May 9, 2026

pelikhan deleted the copilot/aw-failures-fix-continue-error branch May 9, 2026 12:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude harness: avoid invalid `--continue` retries after SIGTERM/no-deferred-marker#31194

Claude harness: avoid invalid `--continue` retries after SIGTERM/no-deferred-marker#31194
pelikhan merged 6 commits into
mainfrom
copilot/aw-failures-fix-continue-error

Copilot AI commented May 9, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented May 9, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot May 9, 2026

Uh oh!

github-actions Bot May 9, 2026

Uh oh!

pelikhan commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

Copilot AI commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -79,6 +79,7 @@ const MAX_TURNS_EXIT_PATTERN = /"subtype"\s:\s"error_max_turns"/;
		// window. Retrying with --continue will always produce the same instant failure, so
		// this is a deterministic terminal condition that must not be retried.

Conversation

Copilot AI commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug Fix

What was the bug?

How did you fix it?

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

github-actions Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 9, 2026

🧪 Test Quality Sentinel Report

Test Quality Score: 88/100

Test Classification Details

Flagged Tests — Requires Review

Score Breakdown

Language Support

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Skills-Based Review 🧠

Key Themes

Positive Highlights

Verdict

Uh oh!

github-actions Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

pelikhan commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

Copilot AI commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented May 9, 2026 •

edited

Loading

github-actions Bot commented May 9, 2026 •

edited

Loading