Python: Use Foundry evaluators to evaluate agent workflows#2322
Merged
moonbox3 merged 14 commits intomicrosoft:mainfrom Nov 21, 2025
Merged
Python: Use Foundry evaluators to evaluate agent workflows#2322moonbox3 merged 14 commits intomicrosoft:mainfrom
moonbox3 merged 14 commits intomicrosoft:mainfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR adds a demo showing how to use Microsoft Foundry's evaluators to evaluate agent workflows in Python. It demonstrates a multi-agent travel planning workflow with 7 specialized agents (hotel search, flight search, activity search, booking confirmation, payment processing, etc.) and evaluates their responses using Azure AI's built-in evaluators (relevance, groundedness, tool call accuracy, and tool output utilization).
- Implements a comprehensive multi-agent workflow with fan-out/fan-in pattern for travel planning
- Adds evaluation infrastructure using Azure AI's built-in evaluators to assess agent performance
- Provides mock travel tools (hotels, flights, activities, bookings, payments) for demonstration
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 35 comments.
| File | Description |
|---|---|
python/samples/demos/workflow_evaluation/run_evaluation.py |
Main evaluation script that orchestrates workflow execution, fetches agent responses, creates evaluations with multiple evaluators, and monitors results |
python/samples/demos/workflow_evaluation/create_workflow.py |
Defines the multi-agent travel planning workflow structure with 7 specialized agents, handles response tracking, and manages workflow execution |
python/samples/demos/workflow_evaluation/_tools.py |
Provides mock implementations of travel-related tools (hotel/flight/activity search, booking, payment) used by agents in the workflow |
python/samples/demos/workflow_evaluation/README.md |
Documents the sample's purpose, evaluation metrics, setup requirements, and usage instructions |
5781600 to
1f07e5e
Compare
TaoChenOSU
reviewed
Nov 19, 2025
959829f to
5a5f702
Compare
TaoChenOSU
reviewed
Nov 20, 2025
TaoChenOSU
approved these changes
Nov 20, 2025
TaoChenOSU
approved these changes
Nov 20, 2025
moonbox3
approved these changes
Nov 21, 2025
Contributor
|
@salma-elshafey looks like there's one conflict to please resolve. |
moonbox3
approved these changes
Nov 21, 2025
arisng
pushed a commit
to arisng/agent-framework
that referenced
this pull request
Feb 2, 2026
…#2322) * Create workflow evaluation with Foundry demo * Upgrade syntax * Add copyright line * import fix * import fix * address pr comments * Python: Workflow eval sample - print evaluator names * Python: Workflow eval - address PR comments * Update samples readme --------- Co-authored-by: Salma Elshafey <selshafey@microsoft.com> Co-authored-by: Evan Mattson <35585003+moonbox3@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Add a demo on how to use Microsoft Foundry's evaluators to evaluate the agents inside a workflow.
Contribution Checklist