Python: fix reasoning model workflow handoff and history serialization#4083
Merged
TaoChenOSU merged 11 commits intomicrosoft:mainfrom Feb 19, 2026
Merged
Python: fix reasoning model workflow handoff and history serialization#4083TaoChenOSU merged 11 commits intomicrosoft:mainfrom
TaoChenOSU merged 11 commits intomicrosoft:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes workflow + function-calling failures when using reasoning-capable models with the OpenAI/Azure Responses API by tightening how reasoning items are emitted/serialized and by preventing duplicate history replay across agent handoffs.
Changes:
- Adjusts Responses API parsing/serialization to (a) only include
reasoninginput items when paired with afunction_call, (b) always emit atext_reasoningmarker (even empty) for hidden/encrypted reasoning, and (c) serializesummaryas an array. - Updates workflow execution to clear
service_session_idwhen explicitly replaying full history to avoid “Duplicate item found” errors. - Improves function-invocation behavior across multi-message responses and adds/expands tests (unit + integration) covering these scenarios.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| python/packages/core/agent_framework/openai/_responses_client.py | Updates reasoning item parsing and input serialization rules for Responses API. |
| python/packages/core/agent_framework/_workflows/_agent_executor.py | Clears service_session_id when replaying explicit history into an executor. |
| python/packages/core/agent_framework/_tools.py | Improves function-call extraction across multiple messages and adjusts stop-path handling. |
| python/packages/core/tests/workflow/test_full_conversation.py | Adds workflow tests for handoff history and service_session_id clearing. |
| python/packages/core/tests/core/test_function_invocation_logic.py | Adds tests for multi-message function calls and stop-path conversation_id behavior. |
| python/packages/core/tests/azure/test_azure_responses_client.py | Adds an integration test that validates minimal workflow handoff across reasoning vs non-reasoning deployments. |
| python/samples/05-end-to-end/workflow_evaluation/run_evaluation.py | Updates the default workflow deployment name to a reasoning model for the evaluation sample. |
| python/samples/02-agents/conversations/redis_chat_message_store_session.py | Makes Redis URL configurable via REDIS_URL env var and updates sample messaging. |
python/packages/core/agent_framework/openai/_responses_client.py
Outdated
Show resolved
Hide resolved
python/packages/core/agent_framework/_workflows/_agent_executor.py
Outdated
Show resolved
Hide resolved
python/packages/core/agent_framework/openai/_responses_client.py
Outdated
Show resolved
Hide resolved
3061161 to
3689393
Compare
Member
dmytrostruk
approved these changes
Feb 19, 2026
TaoChenOSU
reviewed
Feb 19, 2026
python/packages/core/agent_framework/_workflows/_agent_executor.py
Outdated
Show resolved
Hide resolved
TaoChenOSU
reviewed
Feb 19, 2026
python/packages/core/agent_framework/_workflows/_agent_executor.py
Outdated
Show resolved
Hide resolved
TaoChenOSU
reviewed
Feb 19, 2026
python/samples/03-workflows/human-in-the-loop/agents_with_HITL.py
Outdated
Show resolved
Hide resolved
c0ba53f to
bd2d608
Compare
… handoff When a reasoning model (e.g. gpt-5-mini) runs as Agent 1 in a workflow, its response includes text_reasoning items (with server-scoped IDs like rs_XXXX) and function_call items. Forwarding these to Agent 2 in a fresh conversation caused API errors because the reasoning/call IDs are scoped to the original stored response context. Changes: - Strip 'function_call', 'text_reasoning', 'function_approval_request', and 'function_approval_response' from handoff messages in _agent_executor.py - Keep 'function_result' so the actual tool output content is preserved for the next agent's context - Update unit tests to reflect that function_result messages survive handoff (messages grow from 2→3: user, tool(result), assistant(summary)) - Fix incorrect test assertions in test_function_invocation_stop_clears_* that assumed the client layer updates session.service_session_id - Also fixed _extract_function_calls to search all messages with call_id deduplication, and the error-limit stop path to submit function_call_output items before halting (via tool_choice=none cleanup call) Relates to: microsoft#4047 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Fixes multiple related issues when using reasoning models (gpt-5-mini,
gpt-5.2) in multi-agent workflows that chain agents via from_response
or replay full conversation history via AgentExecutorRequest.
## Reasoning items always emitted on output_item.added
When a reasoning model produces encrypted or hidden reasoning (no
visible text), the Responses API still fires a reasoning output item
without any reasoning_text.delta events. Previously no text_reasoning
Content was emitted in that case, making it invisible to downstream
logic. Both the non-streaming (_parse_response_from_openai) and
streaming (output_item.added) paths now always emit at least one
text_reasoning Content — with empty text if no content is available —
so co-occurrence detection and serialization guards work reliably.
## Reasoning items only serialized when paired with a function_call
The Responses API only accepts reasoning items in input when they
directly preceded a function_call in the original response. Sending a
reasoning item that preceded a text response (no tool call) causes:
"reasoning was provided without its required following item"
_prepare_message_for_openai now checks has_function_call per message
and skips text_reasoning serialization when there is no accompanying
function_call.
## summary field is an array, not an object
The reasoning item summary field sent to the Responses API must be an
array of objects ([{"type": "summary_text", "text": ...}]), not a
single object. Fixed _prepare_content_for_openai accordingly.
## service_session_id cleared when explicit history is provided
When a workflow coordinator replays a full conversation (including
function calls from a previous agent run) back to an executor via
AgentExecutorRequest or from_response, the executor's session still
held a service_session_id (previous_response_id) from the prior run.
The API then received the same function-call items twice — once from
previous_response_id (server-stored) and once from the explicit input —
causing: "Duplicate item found with id fc_...".
AgentExecutor.run (when should_respond=True) and from_response now
reset self._session.service_session_id = None before running so that
explicit input is the sole source of conversation context.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…cit history replay Replace the implicit 'always clear service_session_id when should_respond=True' with an explicit opt-in field on AgentExecutorRequest. The old approach used should_respond=True as a proxy for 'full history replay', but that conflates two distinct intents: - Orchestrations group chat sends should_respond=True with an empty/single-message list (not a full replay) — unnecessarily clearing service_session_id. - HITL / feedback coordinators send the full prior conversation and truly need a fresh service session ID to avoid duplicate-item API errors. Changes: - Add AgentExecutorRequest.reset_service_session: bool = False - AgentExecutor.run only clears service_session_id when this flag is True - AgentExecutor.from_response unchanged (always clears; always full conversation) - Set reset_service_session=True in all full-history-replay call sites: agents_with_HITL.py, azure_chat_agents_tool_calls_with_feedback.py, autogen-migration round-robin coordinator, tau2 runner - Update _FullHistoryReplayCoordinator test helper to pass the flag Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2989b60 to
d9193a2
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TaoChenOSU
reviewed
Feb 19, 2026
python/samples/03-workflows/agents/azure_chat_agents_tool_calls_with_feedback.py
Show resolved
Hide resolved
TaoChenOSU
approved these changes
Feb 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes multiple related failures when using reasoning models (gpt-5-mini, gpt-5.2) in multi-agent workflows. The root issues are all about how reasoning items from the Responses API are emitted, serialized, and carried into subsequent agent runs.
Closes #4047
Problems Fixed
1. "reasoning was provided without its required following item"
The Responses API only accepts a
reasoningitem ininputwhen it directly precedes afunction_call. Sending a reasoning item that preceded a text response (no tool call) causes an API error.Fix:
_prepare_message_for_openainow checks whether the message contains afunction_call.text_reasoningcontent is only serialized as areasoninginput item when afunction_callis also present in the same message.2. Reasoning items never emitted for encrypted/hidden reasoning
When a reasoning model produces encrypted or hidden reasoning, the
output_item.addedevent fires with an emptycontentlist and noreasoning_text.deltaevents follow. Previously, notext_reasoningContent was emitted — making it invisible to downstream serialization logic.Fix: Both
_parse_response_from_openai(non-streaming) and theoutput_item.addedhandler (streaming) now always emit at least onetext_reasoningContent, even when the text is empty. Thereasoning_idandencrypted_content(if present) are stored inadditional_properties.3.
summaryfield must be an array, not an objectThe
summaryfield on areasoninginput item must be an array of objects ([{"type": "summary_text", "text": ...}]), not a single object. This caused a 400invalid_typeerror.Fix:
_prepare_content_for_openainow wrapssummaryin a list.summaryis omitted entirely when there is no visible text (e.g. encrypted reasoning, where onlyencrypted_contentis sent).Files Changed
packages/core/agent_framework/openai/_responses_client.pytext_reasoningon reasoning output items; fixsummaryto be an array; skip reasoning serialization when nofunction_callin same messagepackages/core/agent_framework/_workflows/_agent_executor.pyservice_session_idinrunandfrom_responsehandlers; remove no-op_prepare_handoff_messagespackages/core/tests/workflow/test_full_conversation.pytest_run_request_with_full_history_clears_service_session_idandtest_from_response_clears_service_session_id(TDD: fail without fix, pass with fix)