Python: Stop emitting duplicate reasoning content from OpenAI `response.reasoning_text.done` and `response.reasoning_summary_text.done` events by moonbox3 · Pull Request #5162 · microsoft/agent-framework

moonbox3 · 2026-04-08T10:43:04Z

Motivation and Context

The OpenAI Responses API sends both .delta (incremental chunks) and .done (full accumulated text) events for reasoning content. The chat client was converting both into identical Content objects with type text_reasoning, causing downstream consumers like ag-ui to duplicate the reasoning text and emit extra events.

Fixes #5157

Description

The root cause was in _chat_client.py: the response.reasoning_text.done and response.reasoning_summary_text.done handlers created Content.from_text_reasoning objects identical to the delta events, so the full accumulated text was appended on top of already-streamed deltas. The fix removes the Content emission from both .done handlers (keeping only metadata extraction), since the delta events already stream the complete text incrementally. Additionally, a guard was added to _parse_structured_response_value to handle empty text, and regression tests were added in the ag-ui and core test suites to verify that reasoning deltas accumulate correctly without duplication.

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Note: PR autogenerated by moonbox3's agent

…crosoft#5157) The OpenAI Responses API sends both reasoning_text.delta (incremental chunks) and reasoning_text.done (full accumulated text) events. The chat client was emitting Content for both, causing ag-ui to append the full done text onto already-accumulated delta text, producing duplicated reasoning output. Stop emitting Content for reasoning_text.done and reasoning_summary_text.done events, matching how output_text.done is already handled (not emitted). The deltas contain all the content; the done event is redundant. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

moonbox3 · 2026-04-08T10:45:31Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
packages/ag-ui/agent_framework_ag_ui
_agent_run.py	487	51	89%	150–157, 196–197, 204, 301, 313, 317, 319–320, 336, 363–364, 400–404, 518–520, 532–534, 632, 640, 753, 755–756, 809, 826–827, 834, 902, 925, 933, 935, 938, 944, 997, 1000, 1010–1011, 1018, 1064
_run_common.py	286	10	96%	312–313, 319–324, 625–626
_workflow_run.py	509	28	94%	181, 217–220, 248, 253, 281, 291, 302, 307, 310, 323, 363–364, 385, 434, 454, 462, 465, 470, 485, 558, 600–601, 672, 715, 734
packages/azure-cosmos/agent_framework_azure_cosmos
_checkpoint_storage.py	135	1	99%	396
packages/openai/agent_framework_openai
_chat_client.py	867	128	85%	522–525, 529–530, 536–537, 547–548, 555, 570–576, 597, 605, 628, 746, 845, 904, 906, 908, 910, 976, 990, 1070, 1080, 1085, 1128, 1205, 1235, 1292, 1386, 1391, 1395–1397, 1401–1402, 1472, 1501, 1507, 1517, 1523, 1528, 1534, 1539–1540, 1601, 1623–1624, 1639–1640, 1658–1659, 1700–1703, 1865, 1903–1904, 1920, 1922, 2001–2009, 2039, 2146, 2181, 2196, 2216–2226, 2239, 2250–2254, 2268, 2282–2293, 2302, 2334–2337, 2345–2346, 2348–2350, 2364–2366, 2376–2377, 2383, 2398
TOTAL	27202	3187	88%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
5428	20 💤	0 ❌	0 🔥	1m 26s ⏱️

Copilot

Pull request overview

This PR fixes duplicated reasoning text in the Python OpenAI streaming client by ensuring response.reasoning_text.done and response.reasoning_summary_text.done events no longer emit text_reasoning content (since the corresponding .delta events already stream the full text incrementally). This prevents downstream consumers (notably ag-ui) from double-appending reasoning content.

Changes:

Stop emitting Content.from_text_reasoning(...) for response.reasoning_text.done and response.reasoning_summary_text.done events; keep metadata extraction only.
Update OpenAI chat client tests to assert .done events produce no content (regression coverage for duplication).
Add ag-ui regression tests to verify reasoning delta accumulation behavior and per-delta event emission.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
python/packages/openai/agent_framework_openai/_chat_client.py	Avoid duplicate `text_reasoning` content by not emitting reasoning content for `.done` events (metadata-only).
python/packages/openai/tests/openai/test_openai_chat_client.py	Update tests to validate that reasoning `.done` events do not produce `contents`.
python/packages/ag-ui/tests/ag_ui/test_run.py	Add regression tests ensuring reasoning deltas accumulate without duplication and emit one content event per delta.

moonbox3

Automated Code Review

Reviewers: 4 | Confidence: 88%

✓ Correctness

This PR correctly removes the content emission from response.reasoning_text.done and response.reasoning_summary_text.done event handlers in the OpenAI chat client. The .done events carry the full accumulated text that was already streamed via .delta events, so emitting content from both would cause duplication in downstream consumers like the ag-ui reasoning accumulation (_emit_text_reasoning), which appends each delta to flow.accumulated_reasoning. The tests are updated consistently, and new tests in ag-ui verify that delta accumulation works correctly without duplication. The metadata extraction from .done events is preserved, which is correct since metadata should still be captured.

✓ Security Reliability

This PR fixes a content duplication bug where both response.reasoning_text.delta and response.reasoning_text.done (and their summary counterparts) emitted Content.from_text_reasoning, causing downstream AG-UI reasoning accumulation to duplicate the full reasoning text. The fix correctly removes the content emission from .done events while preserving metadata extraction. The .done events in the OpenAI Responses API carry the complete accumulated text (which was already streamed via deltas), so emitting it again was redundant. The new tests in test_run.py properly validate accumulation behavior, and the updated tests in test_openai_chat_client.py correctly assert zero contents from done events. No security or reliability issues found.

✓ Test Coverage

The PR correctly removes duplicate content emission from reasoning done events in the OpenAI chat client and updates corresponding tests. The done-event tests are properly updated to assert empty contents while still verifying metadata propagation. Two new ag-ui tests validate accumulation behavior and per-delta event emission. However, there's no integration-level test that sends a realistic sequence (multiple deltas followed by a done event) through _parse_chunk_from_openai to verify that the combined output has no duplicated reasoning content—the existing tests verify each event type in isolation. The new test_reasoning_deltas_accumulate_correctly test is also somewhat redundant with the pre-existing test_reasoning_accumulates_incremental_deltas test which does nearly the same thing with 3 deltas.

✓ Design Approach

The fix correctly addresses a real duplication bug: the response.reasoning_text.done and response.reasoning_summary_text.done events were emitting the full accumulated text as a new Content.from_text_reasoning item, which the ag-ui _emit_text_reasoning accumulator then appended on top of the already-accumulated delta content, producing doubled output. Suppressing content emission from done events is pragmatic and correct for the common streaming case. However, the fix introduces a fragile assumption: if a done event arrives without any preceding delta events (e.g., if the stream resumes from a checkpoint or if the OpenAI API ever batches very short reasoning into only a done event), the reasoning content is silently dropped with no fallback path. A more robust design would be to track whether deltas were received per item_id and use the done event as a fallback source of truth. Additionally, the new ag-ui tests added in test_run.py verify delta accumulation behavior that was already tested by the surrounding test class, rather than directly testing the fix itself (which is in _chat_client.py). The fix also creates an asymetry with response.code_interpreter_call_code.done, which still emits content from the done event, potentially warranting a similar treatment.

Suggestions

Consider adding an integration-level test in test_openai_chat_client.py that sends a sequence of response.reasoning_text.delta events followed by a response.reasoning_text.done event through _parse_chunk_from_openai and asserts the total collected contents contain only the delta-produced items with no duplication from the done event. This would validate the bug-fix scenario end-to-end rather than testing each event type in isolation.
The new test_reasoning_deltas_accumulate_correctly test in test_run.py (line 1505) is nearly identical to the existing test_reasoning_accumulates_incremental_deltas (line ~1463). Consider removing the redundant test or differentiating it by testing an edge case (e.g., empty-string deltas or a single delta) to more directly demonstrate the deduplication fix.
The fix assumes done events are always preceded by delta events. A more robust approach would be to track whether any deltas were received for a given item_id and emit content from the done event only when no deltas were observed, avoiding silent content loss if the streaming model changes or the API is used in a non-standard way.
The response.code_interpreter_call_code.done case still emits content from the done event, creating an asymmetry. If the ag-ui layer ever adds accumulation support for code interpreter content similar to reasoning, this will cause the same duplication bug. A comment noting the intentional asymmetry would help future maintainers.

Automated review by moonbox3's agents

python/packages/openai/agent_framework_openai/_chat_client.py

python/packages/ag-ui/tests/ag_ui/test_run.py

…bserved (microsoft#5157) Address PR review feedback: - Track item_ids that received reasoning deltas via seen_reasoning_delta_item_ids set - Emit content from done events only when no deltas were received for the item_id, preventing silent content loss on stream resumption - Add comment documenting code_interpreter done event asymmetry - Replace redundant ag-ui test with deduplication-focused test - Add integration test for delta+done sequence in OpenAI chat client tests - Add fallback path tests for done events without preceding deltas Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…esponse.reasoning_text.delta" and "response.reasoning_text.done" both get exposed as "text_reasoning"

moonbox3

Automated Code Review

Reviewers: 4 | Confidence: 97% | Result: All clear

Reviewed: Correctness, Security Reliability, Test Coverage, Design Approach

Automated review by moonbox3's agents

python/packages/ag-ui/tests/ag_ui/test_run.py

…oft#5157) _emit_text_reasoning now follows the same streaming pattern as _emit_text: - Emits ReasoningStartEvent/ReasoningMessageStartEvent only on the first delta for a given message_id - Emits only ReasoningMessageContentEvent for subsequent deltas - Defers ReasoningMessageEndEvent/ReasoningEndEvent until _close_reasoning_block is called (on content type switch or end-of-run) This produces the correct protocol pattern: ReasoningStartEvent ReasoningMessageStartEvent ReasoningMessageContentEvent(delta1) ReasoningMessageContentEvent(delta2) ReasoningMessageEndEvent ReasoningEndEvent Instead of wrapping every delta in a full Start→End sequence. Backward compatibility is preserved: calling _emit_text_reasoning without a flow argument still produces the full sequence per call. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Move inline import of TextMessageContentEvent to the top-level import block and ensure alphabetical ordering to satisfy ruff I001 rule. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…flowEvent The 'event' variable was already typed as WorkflowEvent[Any] from the async for loop at line 590. Reusing it in the _close_reasoning_block loop (which returns list[BaseEvent]) caused an incompatible assignment error. Renamed to 'reasoning_evt' to avoid the conflict. Fixes microsoft#5162 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

python/pytest.xml

Copilot AI review requested due to automatic review settings April 8, 2026 10:43

moonbox3 self-assigned this Apr 8, 2026

moonbox3 added the python label Apr 8, 2026

Copilot started reviewing on behalf of moonbox3 April 8, 2026 10:44 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

moonbox3 commented Apr 8, 2026

View reviewed changes

python/packages/openai/agent_framework_openai/_chat_client.py Outdated Show resolved Hide resolved

python/packages/ag-ui/tests/ag_ui/test_run.py Show resolved Hide resolved

Copilot and others added 2 commits April 8, 2026 10:51

Address review feedback for microsoft#5157: Python: [Bug]: "type": "r…

b3b4343

…esponse.reasoning_text.delta" and "response.reasoning_text.done" both get exposed as "text_reasoning"

moonbox3 commented Apr 8, 2026

View reviewed changes

eavanvalkenburg approved these changes Apr 8, 2026

View reviewed changes

moonbox3 commented Apr 9, 2026

View reviewed changes

python/packages/ag-ui/tests/ag_ui/test_run.py Show resolved Hide resolved

Copilot and others added 7 commits April 9, 2026 02:39

Fix import ordering lint error in AG-UI test file (microsoft#5157)

fc5c803

Move inline import of TextMessageContentEvent to the top-level import block and ensure alphabetical ordering to satisfy ruff I001 rule. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Merge remote-tracking branch 'upstream/main' into agent/fix-5157-2

34cfd69

Address review feedback for microsoft#5157: review comment fixes

85b36cb

narrow test result reporting to explicit pytest JUnit XML

1c2c739

Fix test args

48d48b0

moonbox3 requested review from TaoChenOSU and chetantoshniwal April 9, 2026 06:30

SergeyMenshykh approved these changes Apr 9, 2026

View reviewed changes

python/pytest.xml Show resolved Hide resolved

westey-m approved these changes Apr 9, 2026

View reviewed changes

eavanvalkenburg added this pull request to the merge queue Apr 9, 2026

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Stop emitting duplicate reasoning content from OpenAI `response.reasoning_text.done` and `response.reasoning_summary_text.done` events#5162

Python: Stop emitting duplicate reasoning content from OpenAI `response.reasoning_text.done` and `response.reasoning_summary_text.done` events#5162
moonbox3 wants to merge 10 commits intomicrosoft:mainfrom
moonbox3:agent/fix-5157-2

moonbox3 commented Apr 8, 2026

Uh oh!

moonbox3 commented Apr 8, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

moonbox3 left a comment

Uh oh!

Uh oh!

Uh oh!

moonbox3 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

moonbox3 commented Apr 8, 2026

Motivation and Context

Description

Contribution Checklist

Uh oh!

moonbox3 commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python Unit Test Overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

moonbox3 left a comment

Choose a reason for hiding this comment

Automated Code Review

✓ Correctness

✓ Security Reliability

✓ Test Coverage

✓ Design Approach

Suggestions

Uh oh!

Uh oh!

Uh oh!

moonbox3 left a comment

Choose a reason for hiding this comment

Automated Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

moonbox3 commented Apr 8, 2026 •

edited

Loading