-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
🔴 Required Information
Describe the Bug:
When a remote agent task fails (either due to a tool exception or an internal error like a sessions database connection failure), the error message is converted to regular conversation content instead of being treated as an error. This causes the error to be propagated to downstream agents as artifacts/messages, polluting the conversation history and leaking internal details (like SQL queries, stack traces) into the LLM prompt.
Steps to Reproduce:
Scenario 1: Session database failure
- Set up a multi-agent system with Agent B (router) calling Agent A (sub-agent) via A2A protocol
- Configure Agent B with a
DatabaseSessionServiceconnected to PostgreSQL - Cause a database connection failure in Agent B (e.g., by restarting the database or waiting for connection timeout)
- Send a user message to Agent B that triggers a transfer to Agent A
- Observe that the SQLAlchemy error message becomes part of the conversation history and is visible to the user/LLM
Scenario 2: Tool failure propagated as artifact
- Create Agent A with a tool that raises an exception
- Have Agent B call Agent A via
RemoteA2aAgent - Have Agent B then transfer to Agent C
- Observe that Agent A's error message is sent to Agent C as an artifact (instead of being treated as an error)
Expected Behavior:
- When a task fails (
state: "failed"),RemoteA2aAgentshould create anEventwitherror_messageset instead ofcontent - Events with
error_messageshould NOT be included in the conversation history sent to the LLM - The error should still be logged/propagated for debugging, but not pollute the prompt
Observed Behavior:
The failed task's error message is converted to event.content via convert_a2a_task_to_event(), making it part of the conversation history. When the router agent transfers to another agent, the error appears in the context.
Example conversation showing leaked error:
User: Last week performance YOY
[agent_marketing] called tool transfer_to_agent with parameters: {'agent_name': 'agent_analytics'}
[agent_marketing] transfer_to_agent tool returned result: {'result': None}
[agent_analytics] said: (sqlalchemy.dialects.postgresql.asyncpg.InterfaceError) <class 'asyncpg.exceptions._base.InterfaceError'>: connection is closed
[SQL: SELECT sessions.app_name AS sessions_app_name, sessions.user_id AS sessions_user_id, sessions.id AS sessions_id, sessions.state AS sessions_state, sessions.create_time AS sessions_create_time, sessions.update_time AS sessions_update_time
FROM sessions
WHERE sessions.app_name = $1::VARCHAR AND sessions.user_id = $2::VARCHAR AND sessions.id = $3::VARCHAR]
[parameters: ('agent_analytics', 'user@example.com', '93b169b8-35b4-4b8e-9c80-80f3c7f8f29c')]
Environment Details:
- ADK Library Version: 1.1.0
- Desktop OS: Linux (Kubernetes)
- Python Version: 3.11
Model Information:
- Are you using LiteLLM: Yes
- Which model is being used: gemini-2.5-pro
🟡 Optional Information
Regression:
N/A - this appears to be existing behavior
Logs:
Scenario 1: Agent B's session database fails, error is returned to the caller and becomes part of the conversation.
Scenario 2: Agent A sends this failed task response to Agent B:
{
"id": "097e4d0b-d8c8-4b2b-9bb9-136b547004e3",
"jsonrpc": "2.0",
"result": {
"contextId": "734279ec-91d4-4ae0-9f7b-73094f247500",
"final": true,
"kind": "status-update",
"status": {
"message": {
"kind": "message",
"messageId": "ffa48ac4-8bc3-42b3-9d2f-f06e80eec66e",
"parts": [
{
"kind": "text",
"text": "my exception message"
}
],
"role": "agent"
},
"state": "failed",
"timestamp": "2026-01-28T18:20:03.090828+00:00"
},
"taskId": "6aaf8f8b-ff60-4d78-8672-6566b5821d58"
}
}Agent B then sends this to Agent C (error has become an artifact):
{
"id": "5deb9c00-7de5-4c85-b482-8a7be84fc507",
"jsonrpc": "2.0",
"result": {
"artifact": {
"artifactId": "c5ed0c8d-0389-4482-8cc1-4c756e17d75d",
"parts": [
{
"kind": "text",
"text": "my exception message"
}
]
},
"contextId": "18caacab-edaa-4e70-b218-5737a677cf72",
"kind": "artifact-update",
"lastChunk": true,
"taskId": "eb3b41d3-51da-46ec-807d-6b3d20afe4fb"
}
}Screenshots / Video:
N/A
Additional Context:
The root cause is in RemoteA2aAgent._handle_a2a_response() (google/adk/agents/remote_a2a_agent.py). When processing a failed task, it calls convert_a2a_task_to_event() which puts the error message in event.content. There's no check for TaskState.failed to handle errors differently.
Proposed fix: After convert_a2a_task_to_event(), check if task.status.state == TaskState.failed and if so, create an Event with error_message set (and content empty):
if task and task.status and task.status.state == TaskState.failed:
error_text = "Remote agent task failed"
if event.content and event.content.parts:
text_parts = [p.text for p in event.content.parts if hasattr(p, 'text') and p.text]
if text_parts:
error_text = " ".join(text_parts)
event = Event(
author=self.name,
error_message=error_text,
invocation_id=ctx.invocation_id,
branch=ctx.branch,
)The same fix should be applied to streaming task status updates (A2ATaskStatusUpdateEvent handling).
Minimal Reproduction Code:
from google.adk.agents import LlmAgent
from google.adk.agents.remote_a2a_agent import RemoteA2aAgent
# Agent A - will fail (e.g., tool exception or session DB failure)
agent_a = LlmAgent(
name="agent_a",
model="gemini-2.5-flash",
tools=[failing_tool], # Tool that raises an exception
)
# Agent B - router that calls Agent A
agent_b = LlmAgent(
name="agent_b",
model="gemini-2.5-flash",
sub_agents=[
RemoteA2aAgent(
name="agent_a",
agent_card="http://agent-a:8080/.well-known/agent-card.json",
),
],
)
# When agent_b transfers to agent_a and agent_a fails,
# the error message becomes part of agent_b's conversation historyHow often has this issue occurred?:
- Always (100%) - whenever a remote agent task fails