-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Move duplicate sample.output_status to output_item.sample.error if agent/model response genarated failed #44914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR modifies the evaluation result conversion logic to move error status information from sample.output_status to the sample.error field in the AOAI output format when agent or model responses fail to generate.
Changes:
- Modified
_convert_single_row_to_aoai_formatto check for error status in input data and populate the top-levelsamplefield with error information - Updated test data files to include
sample.output_statusfields with both success and error scenarios - Changed behavior so that successful evaluations have
sample: nullinstead of detailed sample information
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py | Added logic to extract error status from input data and populate sample.error field; removed code that populated sample from criteria metrics |
| sdk/evaluation/azure-ai-evaluation/tests/unittests/data/evaluation_util_convert_old_output_test.jsonl | Updated test data to include sample.output_status fields for both success and error cases |
| sdk/evaluation/azure-ai-evaluation/tests/unittests/data/evaluation_util_convert_expected_output.json | Updated expected output to show sample: null for success cases and sample.error for error cases |
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py
Show resolved
Hide resolved
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py
Outdated
Show resolved
Hide resolved
| if (input_data is not None | ||
| and "sample.output_status.status" in input_data | ||
| and isinstance(input_data["sample.output_status.status"], str) | ||
| and not _is_none_or_nan(input_data["sample.output_status.status"]) | ||
| and input_data["sample.output_status.status"] != "success"): |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error checking logic has unnecessary parentheses wrapping the entire condition starting on line 2358. While syntactically correct, the outer parentheses spanning lines 2358-2362 make the code harder to read. Consider removing them or restructuring for better readability.
| if (input_data is not None | |
| and "sample.output_status.status" in input_data | |
| and isinstance(input_data["sample.output_status.status"], str) | |
| and not _is_none_or_nan(input_data["sample.output_status.status"]) | |
| and input_data["sample.output_status.status"] != "success"): | |
| if input_data is not None \ | |
| and "sample.output_status.status" in input_data \ | |
| and isinstance(input_data["sample.output_status.status"], str) \ | |
| and not _is_none_or_nan(input_data["sample.output_status.status"]) \ | |
| and input_data["sample.output_status.status"] != "success": |
| @@ -2354,14 +2354,35 @@ def _convert_single_row_to_aoai_format( | |||
| run_output_results = [] | |||
| top_sample = {} | |||
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initialization top_sample = {} on line 2355 is immediately overwritten by the logic on lines 2358-2372 (either setting it to {"error": error_info} or None). This initialization serves no purpose and could be removed. Consider initializing it as top_sample = None instead for clarity, or remove the initialization entirely since it's set in all branches.
| top_sample = {} |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Description
Please add an informative description that covers that changes made by the pull request and link all relevant issues.
If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines