-
Notifications
You must be signed in to change notification settings - Fork 3.2k
[Eval SDK] Fix argument type in agent response reformatting for evaluators #44897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a bug in the agent response reformatting functionality for evaluators. The issue was that tool call arguments were always being formatted with quotes around them, regardless of their actual type. This caused non-string arguments (numbers, booleans, None, etc.) to be incorrectly displayed as strings.
Changes:
- Added a new
_format_valuehelper function to properly format different argument types - Updated
_get_agent_responseto use the new formatting function for tool call arguments
| def _format_value(v): | ||
| if isinstance(v, str): | ||
| return f'"{v}"' | ||
| elif v is None: | ||
| return 'None' | ||
| else: | ||
| return str(v) |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _format_value function uses str() for complex types (dicts, lists), which produces Python-style representations (e.g., {'key': 'value'} instead of JSON-style {"key": "value"}). For tool call arguments that may contain nested structures, consider whether json.dumps() would be more appropriate to ensure consistent JSON-style formatting. This would produce output like ranking_options={"ranker": "default_2024_08_21", "score_threshold": 0.0} instead of ranking_options={'ranker': 'default_2024_08_21', 'score_threshold': 0.0}.
| def _format_value(v): | ||
| if isinstance(v, str): | ||
| return f'"{v}"' | ||
| elif v is None: | ||
| return 'None' | ||
| else: | ||
| return str(v) |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bug fix improves argument formatting in tool call messages, but test coverage should be expanded to verify the behavior with different argument types. Consider adding test cases to test__get_agent_response_with_tool_messages that verify formatting with:
- Numeric arguments (integers and floats like
score_threshold: 0.0) - Boolean arguments (e.g.,
enabled: true) - None/null values
- Nested dictionaries (e.g.,
ranking_options: {"ranker": "default", "score_threshold": 0.0})
This would ensure the fix works correctly for all the data types that can appear in tool call arguments.
| def _format_value(v): | ||
| if isinstance(v, str): | ||
| return f'"{v}"' | ||
| elif v is None: | ||
| return 'None' | ||
| else: | ||
| return str(v) |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _format_value helper function lacks a docstring explaining its purpose and behavior. Consider adding a docstring that describes:
- The purpose: Format tool call argument values for display in agent response text
- Parameter type and meaning
- Return value and format
- Examples of how different types are formatted (strings, numbers, None, booleans, complex types)
This would improve code maintainability and help other developers understand the formatting rules.
| def _format_value(v): | ||
| if isinstance(v, str): | ||
| return f'"{v}"' | ||
| elif v is None: | ||
| return 'None' | ||
| else: | ||
| return str(v) |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _format_value function handles booleans using str(), which produces Python-style capitalized boolean strings ("True" or "False"). Consider whether lowercase booleans ("true" or "false") would be more appropriate for consistency with JSON conventions, especially since tool call arguments often originate from JSON data. If the current behavior is intentional for Python-style output, this is acceptable.
Description
Please add an informative description that covers that changes made by the pull request and link all relevant issues.
If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines