Skip to content

Python: Preserve file citation and file path annotations in Assistants API streaming responses#4320

Merged
moonbox3 merged 1 commit intomicrosoft:mainfrom
moonbox3:agent/fix-4316-1
Feb 26, 2026
Merged

Python: Preserve file citation and file path annotations in Assistants API streaming responses#4320
moonbox3 merged 1 commit intomicrosoft:mainfrom
moonbox3:agent/fix-4316-1

Conversation

@moonbox3
Copy link
Contributor

Motivation and Context

When using the Assistants API with streaming enabled, file citation and file path annotations attached to TextDeltaBlock messages were silently dropped. This meant consumers of streamed responses lost all citation metadata (file IDs, text spans, source references), making it impossible to render inline citations or link back to source files.

Fixes #4316

Description

The root cause was that _process_stream_events constructed Content.from_text() from only the delta text value, ignoring the annotations list on TextDeltaBlock.text. The fix inspects delta_block.text.annotations for FileCitationDeltaAnnotation and FilePathDeltaAnnotation objects, maps each to the framework's Annotation type (including file_id, TextSpanRegion, and raw representation), and attaches them to the Content before yielding the ChatResponseUpdate. Two new unit tests verify correct annotation propagation for both file citation and file path annotation types.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Note: PR autogenerated by moonbox3's agent

…ts API streaming (microsoft#4316)

During Assistants API streaming, TextDeltaBlock.text.annotations was
ignored when creating Content objects. This caused raw placeholder
strings like 【4:0†source】 to pass through to downstream consumers
(including AG-UI) instead of being resolved to citation metadata.

Map FileCitationDeltaAnnotation and FilePathDeltaAnnotation from
delta_block.text.annotations to Annotation objects on the Content,
consistent with the existing patterns in _responses_client.py and
_chat_client.py.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 26, 2026 12:44
Copy link
Contributor Author

@moonbox3 moonbox3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 3 | Confidence: 90%

✓ Correctness

The diff adds annotation handling for file citations and file paths during OpenAI Assistants streaming. The logic correctly gates annotation processing behind a check for delta_block.text.annotations, preserves the existing yield for text without annotations, and the two new test cases adequately cover both annotation types. No correctness bugs, race conditions, or type mismatches were identified. The only observation is significant code duplication between the FileCitationDeltaAnnotation and FilePathDeltaAnnotation branches, but both branches are logically correct.

✓ Security Reliability

This diff adds annotation mapping for file citations and file paths during OpenAI Assistants streaming. The code reads structured data from the OpenAI SDK response objects and maps them into internal Annotation/TextSpanRegion types. All nested fields are properly null-checked before access (e.g., file_citation, file_path, start_index, end_index). The data originates from a trusted source (OpenAI API via the official SDK with typed models), and no user-controlled input flows into dangerous sinks. No secrets, unsafe deserialization, injection risks, or resource leaks are introduced. The change is low-risk from a security and reliability perspective.

✓ Test Coverage

Two new tests cover the happy path for FileCitationDeltaAnnotation and FilePathDeltaAnnotation, asserting type, file_id, annotated_regions, and additional_properties. However, the implementation contains four conditional branches (missing file_citation/file_path, and missing start_index/end_index) that are never exercised by any test. There is also no test for multiple annotations on a single delta block, or for verifying the raw_representation field that is set on each Annotation. The existing tests are meaningful and structurally correct, but the branch coverage gaps are notable.

Suggestions

  • The FileCitationDeltaAnnotation and FilePathDeltaAnnotation handling branches are nearly identical (lines 571-604). Consider extracting a helper function to reduce duplication and lower the risk of future divergence between the two branches.
  • Consider wrapping the per-annotation processing in a try/except so that a malformed annotation from the API doesn't abort the entire stream. An unexpected attribute or type mismatch on a single annotation would currently propagate up and terminate stream consumption for the whole response.
  • The FileCitationDeltaAnnotation and FilePathDeltaAnnotation handling blocks are nearly identical. Extracting a helper would reduce the surface area for divergent bugs if one branch is updated but the other is missed.
  • Add a test for FileCitationDeltaAnnotation where file_citation is None (or file_id is None), verifying that no 'file_id' key is set on the resulting Annotation.
  • Add a test for an annotation where start_index or end_index is None, verifying that 'annotated_regions' is not set on the resulting Annotation.
  • Add a test with multiple annotations on a single TextDeltaBlock to verify they are all collected correctly.
  • Consider asserting ann['raw_representation'] in the existing tests to verify the raw OpenAI object is preserved.

Automated review by moonbox3's agents

@markwallace-microsoft
Copy link
Member

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/core/agent_framework/openai
   _assistants_client.py2933588%411, 413, 415, 418, 422–423, 426, 429, 434–435, 437, 440–442, 447, 458, 483, 485, 487, 489, 491, 496, 499, 502, 506, 517, 646, 732, 735, 764, 801–804, 874
TOTAL22192276287% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
4674 247 💤 0 ❌ 0 🔥 1m 22s ⏱️

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug where file citation and file path annotations were silently dropped during Assistants API streaming responses. The root cause was that _process_stream_events only extracted the text value from TextDeltaBlock objects without processing the accompanying annotations array, resulting in raw placeholder strings (e.g., 【4:0†source】) being passed through to consumers instead of resolved citation metadata.

Changes:

  • Added annotation processing logic in _assistants_client.py to map FileCitationDeltaAnnotation and FilePathDeltaAnnotation to the framework's Annotation type
  • Added comprehensive unit tests verifying that both file citation and file path annotations are correctly propagated through streaming responses

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
python/packages/core/agent_framework/openai/_assistants_client.py Implemented annotation mapping logic for streaming TextDeltaBlock messages, extracting file_id, text spans, and metadata from delta annotations
python/packages/core/tests/openai/test_openai_assistants_client.py Added two new test cases verifying correct annotation propagation for file_citation and file_path annotation types

@moonbox3 moonbox3 added this pull request to the merge queue Feb 26, 2026
Merged via the queue into microsoft:main with commit e0461b4 Feb 26, 2026
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

.NET: Python: [Bug]: File Citation Annotations Lost in Assistants API Streaming

5 participants