Skip to content

Commit 67ce1ba

Browse files
eavanvalkenburggiles17Copilot
authored
Python: fix reasoning model workflow handoff and history serialization (#4083)
* fix: strip function_call and text_reasoning from cross-agent workflow handoff When a reasoning model (e.g. gpt-5-mini) runs as Agent 1 in a workflow, its response includes text_reasoning items (with server-scoped IDs like rs_XXXX) and function_call items. Forwarding these to Agent 2 in a fresh conversation caused API errors because the reasoning/call IDs are scoped to the original stored response context. Changes: - Strip 'function_call', 'text_reasoning', 'function_approval_request', and 'function_approval_response' from handoff messages in _agent_executor.py - Keep 'function_result' so the actual tool output content is preserved for the next agent's context - Update unit tests to reflect that function_result messages survive handoff (messages grow from 2→3: user, tool(result), assistant(summary)) - Fix incorrect test assertions in test_function_invocation_stop_clears_* that assumed the client layer updates session.service_session_id - Also fixed _extract_function_calls to search all messages with call_id deduplication, and the error-limit stop path to submit function_call_output items before halting (via tool_choice=none cleanup call) Relates to: #4047 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: reasoning model workflow handoff and history serialization Fixes multiple related issues when using reasoning models (gpt-5-mini, gpt-5.2) in multi-agent workflows that chain agents via from_response or replay full conversation history via AgentExecutorRequest. ## Reasoning items always emitted on output_item.added When a reasoning model produces encrypted or hidden reasoning (no visible text), the Responses API still fires a reasoning output item without any reasoning_text.delta events. Previously no text_reasoning Content was emitted in that case, making it invisible to downstream logic. Both the non-streaming (_parse_response_from_openai) and streaming (output_item.added) paths now always emit at least one text_reasoning Content — with empty text if no content is available — so co-occurrence detection and serialization guards work reliably. ## Reasoning items only serialized when paired with a function_call The Responses API only accepts reasoning items in input when they directly preceded a function_call in the original response. Sending a reasoning item that preceded a text response (no tool call) causes: "reasoning was provided without its required following item" _prepare_message_for_openai now checks has_function_call per message and skips text_reasoning serialization when there is no accompanying function_call. ## summary field is an array, not an object The reasoning item summary field sent to the Responses API must be an array of objects ([{"type": "summary_text", "text": ...}]), not a single object. Fixed _prepare_content_for_openai accordingly. ## service_session_id cleared when explicit history is provided When a workflow coordinator replays a full conversation (including function calls from a previous agent run) back to an executor via AgentExecutorRequest or from_response, the executor's session still held a service_session_id (previous_response_id) from the prior run. The API then received the same function-call items twice — once from previous_response_id (server-stored) and once from the explicit input — causing: "Duplicate item found with id fc_...". AgentExecutor.run (when should_respond=True) and from_response now reset self._session.service_session_id = None before running so that explicit input is the sole source of conversation context. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * small improvements in text reasoning * refactor: add reset_service_session to AgentExecutorRequest for explicit history replay Replace the implicit 'always clear service_session_id when should_respond=True' with an explicit opt-in field on AgentExecutorRequest. The old approach used should_respond=True as a proxy for 'full history replay', but that conflates two distinct intents: - Orchestrations group chat sends should_respond=True with an empty/single-message list (not a full replay) — unnecessarily clearing service_session_id. - HITL / feedback coordinators send the full prior conversation and truly need a fresh service session ID to avoid duplicate-item API errors. Changes: - Add AgentExecutorRequest.reset_service_session: bool = False - AgentExecutor.run only clears service_session_id when this flag is True - AgentExecutor.from_response unchanged (always clears; always full conversation) - Set reset_service_session=True in all full-history-replay call sites: agents_with_HITL.py, azure_chat_agents_tool_calls_with_feedback.py, autogen-migration round-robin coordinator, tau2 runner - Update _FullHistoryReplayCoordinator test helper to pass the flag Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * comment update * fixes from feedback * fix test * reverted changes to agent executor * fix: remove reset_service_session from tau2 runner Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * two other reverts * fix sample --------- Co-authored-by: Giles Odigwe <79032838+giles17@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 2cb4137 commit 67ce1ba

11 files changed

Lines changed: 445 additions & 66 deletions

File tree

python/packages/core/agent_framework/_tools.py

Lines changed: 34 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1761,10 +1761,26 @@ def _get_result_hooks_from_stream(stream: Any) -> list[Callable[[Any], Any]]:
17611761

17621762

17631763
def _extract_function_calls(response: ChatResponse) -> list[Content]:
1764-
function_results = {it.call_id for it in response.messages[0].contents if it.type == "function_result"}
1765-
return [
1766-
it for it in response.messages[0].contents if it.type == "function_call" and it.call_id not in function_results
1767-
]
1764+
function_results = {
1765+
item.call_id
1766+
for message in response.messages
1767+
for item in message.contents
1768+
if item.type == "function_result" and item.call_id
1769+
}
1770+
seen_call_ids: set[str] = set()
1771+
function_calls: list[Content] = []
1772+
for message in response.messages:
1773+
for item in message.contents:
1774+
if item.type != "function_call":
1775+
continue
1776+
if item.call_id and item.call_id in function_results:
1777+
continue
1778+
if item.call_id and item.call_id in seen_call_ids:
1779+
continue
1780+
if item.call_id:
1781+
seen_call_ids.add(item.call_id)
1782+
function_calls.append(item)
1783+
return function_calls
17681784

17691785

17701786
def _prepend_fcc_messages(response: ChatResponse, fcc_messages: list[Message]) -> None:
@@ -1822,27 +1838,22 @@ def _handle_function_call_results(
18221838

18231839
if had_errors:
18241840
errors_in_a_row += 1
1825-
if errors_in_a_row >= max_errors:
1841+
reached_error_limit = errors_in_a_row >= max_errors
1842+
if reached_error_limit:
18261843
logger.warning(
18271844
"Maximum consecutive function call errors reached (%d). "
18281845
"Stopping further function calls for this request.",
18291846
max_errors,
18301847
)
1831-
return {
1832-
"action": "stop",
1833-
"errors_in_a_row": errors_in_a_row,
1834-
"result_message": None,
1835-
"update_role": None,
1836-
"function_call_results": None,
1837-
}
18381848
else:
18391849
errors_in_a_row = 0
1850+
reached_error_limit = False
18401851

18411852
result_message = Message(role="tool", contents=function_call_results)
18421853
response.messages.append(result_message)
18431854
fcc_messages.extend(response.messages)
18441855
return {
1845-
"action": "continue",
1856+
"action": "stop" if reached_error_limit else "continue",
18461857
"errors_in_a_row": errors_in_a_row,
18471858
"result_message": result_message,
18481859
"update_role": "tool",
@@ -2025,6 +2036,7 @@ def get_response(
20252036
middleware_pipeline=function_middleware_pipeline,
20262037
)
20272038
filtered_kwargs = {k: v for k, v in kwargs.items() if k != "session"}
2039+
20282040
# Make options mutable so we can update conversation_id during function invocation loop
20292041
mutable_options: dict[str, Any] = dict(options) if options else {}
20302042
# Remove additional_function_arguments from options passed to underlying chat client
@@ -2090,7 +2102,9 @@ async def _get_response() -> ChatResponse:
20902102
if result["action"] == "return":
20912103
return response
20922104
if result["action"] == "stop":
2093-
break
2105+
# Error threshold reached: force a final non-tool turn so
2106+
# function_call_output items are submitted before exit.
2107+
mutable_options["tool_choice"] = "none"
20942108
errors_in_a_row = result["errors_in_a_row"]
20952109

20962110
# When tool_choice is 'required', reset tool_choice after one iteration to avoid infinite loops
@@ -2157,6 +2171,7 @@ async def _stream() -> AsyncIterable[ChatResponseUpdate]:
21572171
)
21582172
errors_in_a_row = approval_result["errors_in_a_row"]
21592173
if approval_result["action"] == "stop":
2174+
mutable_options["tool_choice"] = "none"
21602175
return
21612176

21622177
inner_stream = await _ensure_response_stream(
@@ -2205,7 +2220,11 @@ async def _stream() -> AsyncIterable[ChatResponseUpdate]:
22052220
contents=result["function_call_results"] or [],
22062221
role=role,
22072222
)
2208-
if result["action"] != "continue":
2223+
if result["action"] == "stop":
2224+
# Error threshold reached: submit collected function_call_output
2225+
# items once more with tools disabled.
2226+
mutable_options["tool_choice"] = "none"
2227+
elif result["action"] != "continue":
22092228
return
22102229

22112230
# When tool_choice is 'required', reset the tool_choice after one iteration to avoid infinite loops

python/packages/core/agent_framework/_types.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -531,6 +531,7 @@ def from_text(
531531
def from_text_reasoning(
532532
cls: type[ContentT],
533533
*,
534+
id: str | None = None,
534535
text: str | None = None,
535536
protected_data: str | None = None,
536537
annotations: Sequence[Annotation] | None = None,
@@ -540,6 +541,7 @@ def from_text_reasoning(
540541
"""Create text reasoning content."""
541542
return cls(
542543
"text_reasoning",
544+
id=id,
543545
text=text,
544546
protected_data=protected_data,
545547
annotations=annotations,

python/packages/core/agent_framework/_workflows/_agent_executor.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -144,10 +144,10 @@ async def from_response(
144144
immediately run the agent to produce a new response.
145145
"""
146146
# Replace cache with full conversation if available, else fall back to agent_response messages.
147-
if prior.full_conversation is not None:
148-
self._cache = list(prior.full_conversation)
149-
else:
150-
self._cache = list(prior.agent_response.messages)
147+
source_messages = (
148+
prior.full_conversation if prior.full_conversation is not None else prior.agent_response.messages
149+
)
150+
self._cache = list(source_messages)
151151
await self._run_agent_and_emit(ctx)
152152

153153
@handler
@@ -311,7 +311,7 @@ async def _run_agent_and_emit(
311311
# Snapshot current conversation as cache + latest agent outputs.
312312
# Do not append to prior snapshots: callers may provide full-history messages
313313
# in request.messages, and extending would duplicate prior turns.
314-
self._full_conversation = list(self._cache) + (list(response.messages) if response else [])
314+
self._full_conversation = [*self._cache, *(list(response.messages) if response else [])]
315315

316316
if response is None:
317317
# Agent did not complete (e.g., waiting for user input); do not emit response

python/packages/core/agent_framework/openai/_responses_client.py

Lines changed: 57 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -908,11 +908,16 @@ def _prepare_message_for_openai(
908908
"type": "message",
909909
"role": message.role,
910910
}
911+
# Reasoning items are only valid in input when they directly preceded a function_call
912+
# in the same response. Including a reasoning item that preceded a text response
913+
# (i.e. no function_call in the same message) causes an API error:
914+
# "reasoning was provided without its required following item."
915+
has_function_call = any(c.type == "function_call" for c in message.contents)
911916
for content in message.contents:
912917
match content.type:
913918
case "text_reasoning":
914-
# Reasoning items must be sent back as top-level input items
915-
# for reasoning models that require them alongside function_calls
919+
if not has_function_call:
920+
continue # reasoning not followed by a function_call is invalid in input
916921
reasoning = self._prepare_content_for_openai(message.role, content, call_id_to_id) # type: ignore[arg-type]
917922
if reasoning:
918923
all_messages.append(reasoning)
@@ -961,26 +966,19 @@ def _prepare_content_for_openai(
961966
"text": content.text,
962967
}
963968
case "text_reasoning":
964-
ret: dict[str, Any] = {
965-
"type": "reasoning",
966-
"summary": {
967-
"type": "summary_text",
968-
"text": content.text,
969-
},
970-
}
969+
ret: dict[str, Any] = {"type": "reasoning", "summary": []}
970+
if content.id:
971+
ret["id"] = content.id
971972
props: dict[str, Any] | None = getattr(content, "additional_properties", None)
972973
if props:
973-
if reasoning_id := props.get("reasoning_id"):
974-
ret["id"] = reasoning_id
975974
if status := props.get("status"):
976975
ret["status"] = status
977976
if reasoning_text := props.get("reasoning_text"):
978-
ret["content"] = {
979-
"type": "reasoning_text",
980-
"text": reasoning_text,
981-
}
977+
ret["content"] = [{"type": "reasoning_text", "text": reasoning_text}]
982978
if encrypted_content := props.get("encrypted_content"):
983979
ret["encrypted_content"] = encrypted_content
980+
if content.text:
981+
ret["summary"].append({"type": "summary_text", "text": content.text})
984982
return ret
985983
case "data" | "uri":
986984
if content.has_top_level_media_type("image"):
@@ -1189,30 +1187,45 @@ def _parse_response_from_openai(
11891187
)
11901188
)
11911189
case "reasoning": # ResponseOutputReasoning
1192-
reasoning_id = getattr(item, "id", None)
1193-
if hasattr(item, "content") and item.content:
1194-
for index, reasoning_content in enumerate(item.content):
1190+
added_reasoning = False
1191+
if item_content := getattr(item, "content", None):
1192+
for index, reasoning_content in enumerate(item_content):
11951193
additional_properties: dict[str, Any] = {}
1196-
if reasoning_id:
1197-
additional_properties["reasoning_id"] = reasoning_id
11981194
if hasattr(item, "summary") and item.summary and index < len(item.summary):
11991195
additional_properties["summary"] = item.summary[index]
12001196
contents.append(
12011197
Content.from_text_reasoning(
1198+
id=item.id,
12021199
text=reasoning_content.text,
12031200
raw_representation=reasoning_content,
12041201
additional_properties=additional_properties or None,
12051202
)
12061203
)
1207-
if hasattr(item, "summary") and item.summary:
1208-
for summary in item.summary:
1204+
added_reasoning = True
1205+
if item_summary := getattr(item, "summary", None):
1206+
for summary in item_summary:
12091207
contents.append(
12101208
Content.from_text_reasoning(
1209+
id=item.id,
12111210
text=summary.text,
12121211
raw_representation=summary, # type: ignore[arg-type]
1213-
additional_properties={"reasoning_id": reasoning_id} if reasoning_id else None,
12141212
)
12151213
)
1214+
added_reasoning = True
1215+
if not added_reasoning:
1216+
# Reasoning item with no visible text (e.g. encrypted reasoning).
1217+
# Always emit an empty marker so co-occurrence detection can be done
1218+
additional_properties_empty: dict[str, Any] = {}
1219+
if encrypted := getattr(item, "encrypted_content", None):
1220+
additional_properties_empty["encrypted_content"] = encrypted
1221+
contents.append(
1222+
Content.from_text_reasoning(
1223+
id=item.id,
1224+
text="",
1225+
raw_representation=item,
1226+
additional_properties=additional_properties_empty or None,
1227+
)
1228+
)
12161229
case "code_interpreter_call": # ResponseOutputCodeInterpreterCall
12171230
call_id = getattr(item, "call_id", None) or getattr(item, "id", None)
12181231
outputs: list[Content] = []
@@ -1427,36 +1440,36 @@ def _parse_chunk_from_openai(
14271440
case "response.reasoning_text.delta":
14281441
contents.append(
14291442
Content.from_text_reasoning(
1443+
id=event.item_id,
14301444
text=event.delta,
14311445
raw_representation=event,
1432-
additional_properties={"reasoning_id": event.item_id},
14331446
)
14341447
)
14351448
metadata.update(self._get_metadata_from_response(event))
14361449
case "response.reasoning_text.done":
14371450
contents.append(
14381451
Content.from_text_reasoning(
1452+
id=event.item_id,
14391453
text=event.text,
14401454
raw_representation=event,
1441-
additional_properties={"reasoning_id": event.item_id},
14421455
)
14431456
)
14441457
metadata.update(self._get_metadata_from_response(event))
14451458
case "response.reasoning_summary_text.delta":
14461459
contents.append(
14471460
Content.from_text_reasoning(
1461+
id=event.item_id,
14481462
text=event.delta,
14491463
raw_representation=event,
1450-
additional_properties={"reasoning_id": event.item_id},
14511464
)
14521465
)
14531466
metadata.update(self._get_metadata_from_response(event))
14541467
case "response.reasoning_summary_text.done":
14551468
contents.append(
14561469
Content.from_text_reasoning(
1470+
id=event.item_id,
14571471
text=event.text,
14581472
raw_representation=event,
1459-
additional_properties={"reasoning_id": event.item_id},
14601473
)
14611474
)
14621475
metadata.update(self._get_metadata_from_response(event))
@@ -1630,11 +1643,10 @@ def _parse_chunk_from_openai(
16301643
)
16311644
case "reasoning": # ResponseOutputReasoning
16321645
reasoning_id = getattr(event_item, "id", None)
1646+
added_reasoning = False
16331647
if hasattr(event_item, "content") and event_item.content:
16341648
for index, reasoning_content in enumerate(event_item.content):
16351649
additional_properties: dict[str, Any] = {}
1636-
if reasoning_id:
1637-
additional_properties["reasoning_id"] = reasoning_id
16381650
if (
16391651
hasattr(event_item, "summary")
16401652
and event_item.summary
@@ -1643,11 +1655,27 @@ def _parse_chunk_from_openai(
16431655
additional_properties["summary"] = event_item.summary[index]
16441656
contents.append(
16451657
Content.from_text_reasoning(
1658+
id=reasoning_id or None,
16461659
text=reasoning_content.text,
16471660
raw_representation=reasoning_content,
16481661
additional_properties=additional_properties or None,
16491662
)
16501663
)
1664+
added_reasoning = True
1665+
if not added_reasoning:
1666+
# Reasoning item with no visible text (e.g. encrypted reasoning).
1667+
# Always emit an empty marker so co-occurrence detection can occur.
1668+
additional_properties_empty: dict[str, Any] = {}
1669+
if encrypted := getattr(event_item, "encrypted_content", None):
1670+
additional_properties_empty["encrypted_content"] = encrypted
1671+
contents.append(
1672+
Content.from_text_reasoning(
1673+
id=reasoning_id or None,
1674+
text="",
1675+
raw_representation=event_item,
1676+
additional_properties=additional_properties_empty or None,
1677+
)
1678+
)
16511679
case _:
16521680
logger.debug("Unparsed event of type: %s: %s", event.type, event)
16531681
case "response.function_call_arguments.delta":

0 commit comments

Comments
 (0)