Skip to content

Sub-Agent task_complete Bug: Race Condition Causes Response Loss #1324

@PlaneInABottle

Description

@PlaneInABottle

Sub-Agent task_complete Bug: Race Condition Causes Response Loss

Summary

When sub-agents (spawned via the task tool) are instructed to use the task_complete tool, their responses are not captured by the parent agent, resulting in "General-purpose agent did not produce a response." This is a race condition in the sub-agent framework that affects all Claude models (Opus 4.6, Sonnet 4, Sonnet 4.5, Haiku 4.5), not just Opus 4.6.

Root Cause

The Copilot CLI sub-agent framework extracts responses from the sub-agent's text output stream, not from the task_complete summary. When a sub-agent calls task_complete, it terminates the response stream. This creates a race condition:

Happy path (works):

  1. Sub-agent emits text response
  2. Sub-agent calls task_complete
  3. Parent captures text before stream terminates
  4. ✅ Response returned successfully

Bug path (fails):

  1. Sub-agent calls task_complete as first/only action
  2. Response stream terminates immediately
  3. Parent receives empty/null response
  4. ❌ "General-purpose agent did not produce a response"

Evidence

Experimental Results

Spawned sub-agents with identical tasks, varying only whether they're told to use task_complete:

Model With task_complete Without task_complete
claude-opus-4.6 6/13 success (46%) 2/2 success (100%)
claude-sonnet-4.5 0/1 fail
claude-sonnet-4 0/1 fail 1/1 success
claude-haiku-4.5 0/1 fail
claude-opus-4.5 1/1 success
gpt-5.1 1/1 success

Key findings:

  1. Bug affects all Claude models as sub-agents, not just Opus 4.6
  2. Failure is non-deterministic (46% success rate for Opus 4.6)
  3. GPT models appear unaffected
  4. Without task_complete instruction: 100% success rate

Non-Deterministic Behavior

Prompt intensity matters:

  • Emphatic prompts ("CRITICAL MUST use task_complete ONLY") → 100% failure
  • Moderate prompts ("Do X and use task_complete") → ~46% success for Opus 4.6

Why? The more the prompt emphasizes task_complete, the more likely the model calls it before or instead of emitting text, triggering the race condition.

Reproduction

Guaranteed Failure (100%)

task(
    agent_type="general-purpose",
    prompt="""
    Your ONLY job is to call task_complete with summary "Test".
    
    CRITICAL: Use task_complete tool. Do not write any other response.
    """,
    mode="sync",
    model="claude-opus-4.6"
)
# Result: "General-purpose agent did not produce a response"

Non-Deterministic Failure (~54%)

task(
    agent_type="general-purpose",
    prompt="""
    Count files in scripts/ directory.
    
    IMPORTANT: You MUST use task_complete when done.
    """,
    mode="sync",
    model="claude-opus-4.6"
)
# Result: Sometimes works, sometimes "did not produce a response"

Guaranteed Success (100%)

task(
    agent_type="general-purpose",
    prompt="""
    Count files in scripts/ directory.
    
    DO NOT USE task_complete. Return your findings directly.
    """,
    mode="sync",
    model="claude-opus-4.6"
)
# Result: Always returns response

Impact

Affected:

  • All Claude model sub-agents (Opus 4.6, Sonnet 4/4.5, Haiku 4.5)
  • Both sync and background modes
  • Parent agents cannot reliably use Claude sub-agents if prompts mention task_complete

Not affected:

  • GPT models (gpt-5.1, gpt-5.2-codex)
  • Claude Opus 4.5 (appears more reliable but not tested extensively)
  • Direct agent usage (non-sub-agents)

Workaround

For sub-agent prompts: Never instruct sub-agents to use task_complete. The parent framework already handles completion signaling. Instruct them to return findings directly:

task(
    agent_type="general-purpose",
    prompt="""
    Analyze X and report findings.
    
    DO NOT USE task_complete tool. Return your response text directly.
    """,
    mode="sync",
    model="claude-opus-4.6"
)

Success rate: 100% with this workaround

Recommended Fix

The framework should:

  1. Buffer text output before allowing task_complete to terminate the stream
  2. Ensure response serialization completes before terminating agent context
  3. Consider making task_complete a soft signal rather than immediate termination
  4. Or: Auto-suppress task_complete for sub-agents since parent framework already tracks completion

The fundamental issue is that task_complete terminates execution before ensuring the response text is captured, creating an avoidable race condition.

Environment

  • Copilot CLI version: 0.0.405
  • OS: macOS (Darwin)
  • Affected models: All Claude models (tested: Opus 4.6, Sonnet 4, Sonnet 4.5, Haiku 4.5)
  • Working models: GPT-5.1, GPT-5.2-Codex

Related

This issue was initially reported as Opus 4.6-specific, but debugging revealed it affects all Claude models when used as sub-agents. The non-deterministic nature made it appear model-specific initially.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions