Python: Preserve file citation and file path annotations in Assistants API streaming responses#4320
Conversation
…ts API streaming (microsoft#4316) During Assistants API streaming, TextDeltaBlock.text.annotations was ignored when creating Content objects. This caused raw placeholder strings like 【4:0†source】 to pass through to downstream consumers (including AG-UI) instead of being resolved to citation metadata. Map FileCitationDeltaAnnotation and FilePathDeltaAnnotation from delta_block.text.annotations to Annotation objects on the Content, consistent with the existing patterns in _responses_client.py and _chat_client.py. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
moonbox3
left a comment
There was a problem hiding this comment.
Automated Code Review
Reviewers: 3 | Confidence: 90%
✓ Correctness
The diff adds annotation handling for file citations and file paths during OpenAI Assistants streaming. The logic correctly gates annotation processing behind a check for
delta_block.text.annotations, preserves the existing yield for text without annotations, and the two new test cases adequately cover both annotation types. No correctness bugs, race conditions, or type mismatches were identified. The only observation is significant code duplication between the FileCitationDeltaAnnotation and FilePathDeltaAnnotation branches, but both branches are logically correct.
✓ Security Reliability
This diff adds annotation mapping for file citations and file paths during OpenAI Assistants streaming. The code reads structured data from the OpenAI SDK response objects and maps them into internal Annotation/TextSpanRegion types. All nested fields are properly null-checked before access (e.g., file_citation, file_path, start_index, end_index). The data originates from a trusted source (OpenAI API via the official SDK with typed models), and no user-controlled input flows into dangerous sinks. No secrets, unsafe deserialization, injection risks, or resource leaks are introduced. The change is low-risk from a security and reliability perspective.
✓ Test Coverage
Two new tests cover the happy path for FileCitationDeltaAnnotation and FilePathDeltaAnnotation, asserting type, file_id, annotated_regions, and additional_properties. However, the implementation contains four conditional branches (missing file_citation/file_path, and missing start_index/end_index) that are never exercised by any test. There is also no test for multiple annotations on a single delta block, or for verifying the raw_representation field that is set on each Annotation. The existing tests are meaningful and structurally correct, but the branch coverage gaps are notable.
Suggestions
- The FileCitationDeltaAnnotation and FilePathDeltaAnnotation handling branches are nearly identical (lines 571-604). Consider extracting a helper function to reduce duplication and lower the risk of future divergence between the two branches.
- Consider wrapping the per-annotation processing in a try/except so that a malformed annotation from the API doesn't abort the entire stream. An unexpected attribute or type mismatch on a single annotation would currently propagate up and terminate stream consumption for the whole response.
- The FileCitationDeltaAnnotation and FilePathDeltaAnnotation handling blocks are nearly identical. Extracting a helper would reduce the surface area for divergent bugs if one branch is updated but the other is missed.
- Add a test for FileCitationDeltaAnnotation where file_citation is None (or file_id is None), verifying that no 'file_id' key is set on the resulting Annotation.
- Add a test for an annotation where start_index or end_index is None, verifying that 'annotated_regions' is not set on the resulting Annotation.
- Add a test with multiple annotations on a single TextDeltaBlock to verify they are all collected correctly.
- Consider asserting ann['raw_representation'] in the existing tests to verify the raw OpenAI object is preserved.
Automated review by moonbox3's agents
Python Test Coverage Report •
Python Unit Test Overview
|
||||||||||||||||||||||||||||||
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug where file citation and file path annotations were silently dropped during Assistants API streaming responses. The root cause was that _process_stream_events only extracted the text value from TextDeltaBlock objects without processing the accompanying annotations array, resulting in raw placeholder strings (e.g., 【4:0†source】) being passed through to consumers instead of resolved citation metadata.
Changes:
- Added annotation processing logic in
_assistants_client.pyto mapFileCitationDeltaAnnotationandFilePathDeltaAnnotationto the framework'sAnnotationtype - Added comprehensive unit tests verifying that both file citation and file path annotations are correctly propagated through streaming responses
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| python/packages/core/agent_framework/openai/_assistants_client.py | Implemented annotation mapping logic for streaming TextDeltaBlock messages, extracting file_id, text spans, and metadata from delta annotations |
| python/packages/core/tests/openai/test_openai_assistants_client.py | Added two new test cases verifying correct annotation propagation for file_citation and file_path annotation types |
Motivation and Context
When using the Assistants API with streaming enabled, file citation and file path annotations attached to
TextDeltaBlockmessages were silently dropped. This meant consumers of streamed responses lost all citation metadata (file IDs, text spans, source references), making it impossible to render inline citations or link back to source files.Fixes #4316
Description
The root cause was that
_process_stream_eventsconstructedContent.from_text()from only the delta text value, ignoring theannotationslist onTextDeltaBlock.text. The fix inspectsdelta_block.text.annotationsforFileCitationDeltaAnnotationandFilePathDeltaAnnotationobjects, maps each to the framework'sAnnotationtype (includingfile_id,TextSpanRegion, and raw representation), and attaches them to theContentbefore yielding theChatResponseUpdate. Two new unit tests verify correct annotation propagation for both file citation and file path annotation types.Contribution Checklist