openai-agents: Handle various types of span_data.response.output parts#4208
openai-agents: Handle various types of span_data.response.output parts#4208adammw wants to merge 7 commits intoopen-telemetry:mainfrom
Conversation
203d9f0 to
29227af
Compare
There was a problem hiding this comment.
Seems that when include_sensitive_data=False, the arguments here become {"queries": ["readacted"]} (a dict wrapping a list), but when True, getattr(item, "queries", None) returns a bare list.
Question:
- Would consumers parsing this output see different shapes depending on the sensitivity setting? Should the
Truepath also wrap in{"queries": ...}to keep the structure consistent?
There was a problem hiding this comment.
think this was a typo or confusion of the scope of the if statement, added brackets to make it more obvious. the shape should remain the same.
There was a problem hiding this comment.
think this was a typo or confusion of the scope of the if statement, added brackets to make it more obvious. the shape should remain the same.
Thanks for the quick fix, it's clear to me now!
There was a problem hiding this comment.
Turns out none of this is required - when include_sensitive_data=False the method never gets called so all this logic can go.
There was a problem hiding this comment.
nit: "readacted" → "redacted"? Looks like this may be inherited from the existing codebase, so maybe a separate cleanup PR if the project wants to address it
| "stop", | ||
| ) | ||
|
|
||
| # Check output messages are properly normalized |
There was a problem hiding this comment.
Nice coverage of all the output types! Related to my comment above:
Since the instrumentor defaults to capture_message_content=SPAN_AND_EVENT (i.e. include_sensitive_data=True) when no kwarg is passed, seems all these assertions exercise the non-redacted path. Would it be worth adding a companion test with capture_message_content=False to verify the redaction paths produce the expected shape for a few of the more complex types (e.g., mcp_call, code_interpreter_call)?
There was a problem hiding this comment.
Added the test_response_span_records_redacted_response_attributes test but the redaction is basically absolute - the output message attribute is not included at all - so none of that code is necessary.
There was a problem hiding this comment.
Speaking of output shape, this will vary here depending on sensitivity, but I couldn't think of a better way to do it.
There was a problem hiding this comment.
Ack, probably unavoidable here without adding a lot of complexity. I think consumers would need to handle both cases regardless. Not a blocker.
f69c84a to
c7ca55f
Compare
c7ca55f to
fa628a6
Compare
Description
Handle various defined types of span_data.response.output parts, and preference parsing the parts over using the consolidated output_text attribute. Some
finish_reasonlogic has also been touched, however I cannot find documentation that they have ever been returned from the Responses API.†text schema does not define annotations which are added as a custom key to preserve as much metadata as possible
*custom/non-standard - no equivalent exists in current GenAI Semantic Conventions schema
Fixes #4185
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Does This PR Require a Core Repo Change?
Checklist:
See contributing.md for styleguide, changelog guidelines, and more.