Skip to content

Python: Use Foundry evaluators to evaluate agent workflows#2322

Merged
moonbox3 merged 14 commits intomicrosoft:mainfrom
salma-elshafey:selshafey/workflow_foundry_eval
Nov 21, 2025
Merged

Python: Use Foundry evaluators to evaluate agent workflows#2322
moonbox3 merged 14 commits intomicrosoft:mainfrom
salma-elshafey:selshafey/workflow_foundry_eval

Conversation

@salma-elshafey
Copy link
Contributor

Description

Add a demo on how to use Microsoft Foundry's evaluators to evaluate the agents inside a workflow.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Copilot AI review requested due to automatic review settings November 19, 2025 13:29
@markwallace-microsoft markwallace-microsoft added documentation Improvements or additions to documentation python labels Nov 19, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a demo showing how to use Microsoft Foundry's evaluators to evaluate agent workflows in Python. It demonstrates a multi-agent travel planning workflow with 7 specialized agents (hotel search, flight search, activity search, booking confirmation, payment processing, etc.) and evaluates their responses using Azure AI's built-in evaluators (relevance, groundedness, tool call accuracy, and tool output utilization).

  • Implements a comprehensive multi-agent workflow with fan-out/fan-in pattern for travel planning
  • Adds evaluation infrastructure using Azure AI's built-in evaluators to assess agent performance
  • Provides mock travel tools (hotels, flights, activities, bookings, payments) for demonstration

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 35 comments.

File Description
python/samples/demos/workflow_evaluation/run_evaluation.py Main evaluation script that orchestrates workflow execution, fetches agent responses, creates evaluations with multiple evaluators, and monitors results
python/samples/demos/workflow_evaluation/create_workflow.py Defines the multi-agent travel planning workflow structure with 7 specialized agents, handles response tracking, and manages workflow execution
python/samples/demos/workflow_evaluation/_tools.py Provides mock implementations of travel-related tools (hotel/flight/activity search, booking, payment) used by agents in the workflow
python/samples/demos/workflow_evaluation/README.md Documents the sample's purpose, evaluation metrics, setup requirements, and usage instructions

@salma-elshafey salma-elshafey force-pushed the selshafey/workflow_foundry_eval branch from 5781600 to 1f07e5e Compare November 19, 2025 18:00
@salma-elshafey salma-elshafey force-pushed the selshafey/workflow_foundry_eval branch from 959829f to 5a5f702 Compare November 20, 2025 11:03
@moonbox3
Copy link
Contributor

@salma-elshafey looks like there's one conflict to please resolve.

@moonbox3 moonbox3 enabled auto-merge November 21, 2025 09:49
@moonbox3 moonbox3 added this pull request to the merge queue Nov 21, 2025
Merged via the queue into microsoft:main with commit 4c52903 Nov 21, 2025
23 checks passed
arisng pushed a commit to arisng/agent-framework that referenced this pull request Feb 2, 2026
…#2322)

* Create workflow evaluation with Foundry demo

* Upgrade syntax

* Add copyright line

* import fix

* import fix

* address pr comments

* Python: Workflow eval sample - print evaluator names

* Python: Workflow eval - address PR comments

* Update samples readme

---------

Co-authored-by: Salma Elshafey <selshafey@microsoft.com>
Co-authored-by: Evan Mattson <35585003+moonbox3@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants