Python: Stop emitting duplicate reasoning content from OpenAI response.reasoning_text.done and response.reasoning_summary_text.done events#5162
Conversation
…crosoft#5157) The OpenAI Responses API sends both reasoning_text.delta (incremental chunks) and reasoning_text.done (full accumulated text) events. The chat client was emitting Content for both, causing ag-ui to append the full done text onto already-accumulated delta text, producing duplicated reasoning output. Stop emitting Content for reasoning_text.done and reasoning_summary_text.done events, matching how output_text.done is already handled (not emitted). The deltas contain all the content; the done event is redundant. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes duplicated reasoning text in the Python OpenAI streaming client by ensuring response.reasoning_text.done and response.reasoning_summary_text.done events no longer emit text_reasoning content (since the corresponding .delta events already stream the full text incrementally). This prevents downstream consumers (notably ag-ui) from double-appending reasoning content.
Changes:
- Stop emitting
Content.from_text_reasoning(...)forresponse.reasoning_text.doneandresponse.reasoning_summary_text.doneevents; keep metadata extraction only. - Update OpenAI chat client tests to assert
.doneevents produce no content (regression coverage for duplication). - Add ag-ui regression tests to verify reasoning delta accumulation behavior and per-delta event emission.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| python/packages/openai/agent_framework_openai/_chat_client.py | Avoid duplicate text_reasoning content by not emitting reasoning content for .done events (metadata-only). |
| python/packages/openai/tests/openai/test_openai_chat_client.py | Update tests to validate that reasoning .done events do not produce contents. |
| python/packages/ag-ui/tests/ag_ui/test_run.py | Add regression tests ensuring reasoning deltas accumulate without duplication and emit one content event per delta. |
moonbox3
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 88%
✓ Correctness
This PR correctly removes the content emission from
response.reasoning_text.doneandresponse.reasoning_summary_text.doneevent handlers in the OpenAI chat client. The.doneevents carry the full accumulated text that was already streamed via.deltaevents, so emitting content from both would cause duplication in downstream consumers like the ag-ui reasoning accumulation (_emit_text_reasoning), which appends each delta toflow.accumulated_reasoning. The tests are updated consistently, and new tests in ag-ui verify that delta accumulation works correctly without duplication. The metadata extraction from.doneevents is preserved, which is correct since metadata should still be captured.
✓ Security Reliability
This PR fixes a content duplication bug where both
response.reasoning_text.deltaandresponse.reasoning_text.done(and their summary counterparts) emittedContent.from_text_reasoning, causing downstream AG-UI reasoning accumulation to duplicate the full reasoning text. The fix correctly removes the content emission from.doneevents while preserving metadata extraction. The.doneevents in the OpenAI Responses API carry the complete accumulated text (which was already streamed via deltas), so emitting it again was redundant. The new tests intest_run.pyproperly validate accumulation behavior, and the updated tests intest_openai_chat_client.pycorrectly assert zero contents from done events. No security or reliability issues found.
✓ Test Coverage
The PR correctly removes duplicate content emission from reasoning done events in the OpenAI chat client and updates corresponding tests. The done-event tests are properly updated to assert empty contents while still verifying metadata propagation. Two new ag-ui tests validate accumulation behavior and per-delta event emission. However, there's no integration-level test that sends a realistic sequence (multiple deltas followed by a done event) through
_parse_chunk_from_openaito verify that the combined output has no duplicated reasoning content—the existing tests verify each event type in isolation. The newtest_reasoning_deltas_accumulate_correctlytest is also somewhat redundant with the pre-existingtest_reasoning_accumulates_incremental_deltastest which does nearly the same thing with 3 deltas.
✓ Design Approach
The fix correctly addresses a real duplication bug: the
response.reasoning_text.doneandresponse.reasoning_summary_text.doneevents were emitting the full accumulated text as a newContent.from_text_reasoningitem, which the ag-ui_emit_text_reasoningaccumulator then appended on top of the already-accumulated delta content, producing doubled output. Suppressing content emission from done events is pragmatic and correct for the common streaming case. However, the fix introduces a fragile assumption: if adoneevent arrives without any precedingdeltaevents (e.g., if the stream resumes from a checkpoint or if the OpenAI API ever batches very short reasoning into only a done event), the reasoning content is silently dropped with no fallback path. A more robust design would be to track whether deltas were received peritem_idand use the done event as a fallback source of truth. Additionally, the new ag-ui tests added intest_run.pyverify delta accumulation behavior that was already tested by the surrounding test class, rather than directly testing the fix itself (which is in_chat_client.py). The fix also creates an asymetry withresponse.code_interpreter_call_code.done, which still emits content from the done event, potentially warranting a similar treatment.
Suggestions
- Consider adding an integration-level test in
test_openai_chat_client.pythat sends a sequence ofresponse.reasoning_text.deltaevents followed by aresponse.reasoning_text.doneevent through_parse_chunk_from_openaiand asserts the total collected contents contain only the delta-produced items with no duplication from the done event. This would validate the bug-fix scenario end-to-end rather than testing each event type in isolation. - The new
test_reasoning_deltas_accumulate_correctlytest intest_run.py(line 1505) is nearly identical to the existingtest_reasoning_accumulates_incremental_deltas(line ~1463). Consider removing the redundant test or differentiating it by testing an edge case (e.g., empty-string deltas or a single delta) to more directly demonstrate the deduplication fix. - The fix assumes
doneevents are always preceded bydeltaevents. A more robust approach would be to track whether any deltas were received for a givenitem_idand emit content from the done event only when no deltas were observed, avoiding silent content loss if the streaming model changes or the API is used in a non-standard way. - The
response.code_interpreter_call_code.donecase still emits content from the done event, creating an asymmetry. If the ag-ui layer ever adds accumulation support for code interpreter content similar to reasoning, this will cause the same duplication bug. A comment noting the intentional asymmetry would help future maintainers.
Automated review by moonbox3's agents
…bserved (microsoft#5157) Address PR review feedback: - Track item_ids that received reasoning deltas via seen_reasoning_delta_item_ids set - Emit content from done events only when no deltas were received for the item_id, preventing silent content loss on stream resumption - Add comment documenting code_interpreter done event asymmetry - Replace redundant ag-ui test with deduplication-focused test - Add integration test for delta+done sequence in OpenAI chat client tests - Add fallback path tests for done events without preceding deltas Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…esponse.reasoning_text.delta" and "response.reasoning_text.done" both get exposed as "text_reasoning"
moonbox3
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 97% | Result: All clear
Reviewed: Correctness, Security Reliability, Test Coverage, Design Approach
Automated review by moonbox3's agents
…oft#5157) _emit_text_reasoning now follows the same streaming pattern as _emit_text: - Emits ReasoningStartEvent/ReasoningMessageStartEvent only on the first delta for a given message_id - Emits only ReasoningMessageContentEvent for subsequent deltas - Defers ReasoningMessageEndEvent/ReasoningEndEvent until _close_reasoning_block is called (on content type switch or end-of-run) This produces the correct protocol pattern: ReasoningStartEvent ReasoningMessageStartEvent ReasoningMessageContentEvent(delta1) ReasoningMessageContentEvent(delta2) ReasoningMessageEndEvent ReasoningEndEvent Instead of wrapping every delta in a full Start→End sequence. Backward compatibility is preserved: calling _emit_text_reasoning without a flow argument still produces the full sequence per call. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move inline import of TextMessageContentEvent to the top-level import block and ensure alphabetical ordering to satisfy ruff I001 rule. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…flowEvent The 'event' variable was already typed as WorkflowEvent[Any] from the async for loop at line 590. Reusing it in the _close_reasoning_block loop (which returns list[BaseEvent]) caused an incompatible assignment error. Renamed to 'reasoning_evt' to avoid the conflict. Fixes microsoft#5162 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation and Context
The OpenAI Responses API sends both
.delta(incremental chunks) and.done(full accumulated text) events for reasoning content. The chat client was converting both into identicalContentobjects with typetext_reasoning, causing downstream consumers like ag-ui to duplicate the reasoning text and emit extra events.Fixes #5157
Description
The root cause was in
_chat_client.py: theresponse.reasoning_text.doneandresponse.reasoning_summary_text.donehandlers createdContent.from_text_reasoningobjects identical to the delta events, so the full accumulated text was appended on top of already-streamed deltas. The fix removes theContentemission from both.donehandlers (keeping only metadata extraction), since the delta events already stream the complete text incrementally. Additionally, a guard was added to_parse_structured_response_valueto handle empty text, and regression tests were added in the ag-ui and core test suites to verify that reasoning deltas accumulate correctly without duplication.Contribution Checklist