Python: Use AI Foundry evaluators for self-reflection by YusakuNo1 · Pull Request #2250 · microsoft/agent-framework

YusakuNo1 · 2025-11-15T17:04:48Z

Motivation and Context

Description

Demostrate how to implement self-reflection from agent framework

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Copilot

Pull Request Overview

This PR adds a new Python sample demonstrating self-reflection capabilities for LLM responses using AI Foundry's groundedness evaluators. The sample shows how to iteratively improve LLM responses by evaluating them and providing feedback for refinement.

New self-reflection sample using groundedness evaluation
Batch processing capability for evaluating multiple prompts
Integration with Azure OpenAI and AI Foundry evaluators

Reviewed Changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 7 comments.

File	Description
`python/samples/getting_started/observability/self_reflection.py`	New sample implementing self-reflection loop with groundedness evaluation for LLM responses
`python/samples/getting_started/observability/resources/suboptimal_groundedness_prompts.parquet`	Test data file containing prompts for self-reflection evaluation
`python/samples/getting_started/observability/.env.example`	Added Azure OpenAI configuration variables for the self-reflection sample
`python/samples/README.md`	Added reference to the new self-reflection sample

Comments suppressed due to low confidence (2)

python/samples/getting_started/observability/.env.example:21

The model name "gpt-4.1" in the .env.example file doesn't appear to be a valid Azure OpenAI model deployment name. This should be updated to a valid model name like "gpt-4", "gpt-4o", or "gpt-4-turbo".
python/samples/getting_started/observability/self_reflection.py:161
This statement is unreachable.

        best_response = raw_response.choices[0].message.content

python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/self_reflection.py

python/samples/getting_started/observability/self_reflection.py

eavanvalkenburg

Love the idea of this, but could you have a look at our ChatMiddleware and implement using that, that way it becomes a much more native way of working, and if you put the actual middleware function inside of a class then all the setup and best_score etc. can be captured and maintained, this samples shows classes, the same works for ChatMiddleware, but this could also be implemented with AgentMiddleware: https://github.com/microsoft/agent-framework/blob/main/python/samples/getting_started/middleware/class_based_middleware.py

YusakuNo1 · 2025-11-17T17:53:42Z

Love the idea of this, but could you have a look at our ChatMiddleware and implement using that, that way it becomes a much more native way of working, and if you put the actual middleware function inside of a class then all the setup and best_score etc. can be captured and maintained, this samples shows classes, the same works for ChatMiddleware, but this could also be implemented with AgentMiddleware: https://github.com/microsoft/agent-framework/blob/main/python/samples/getting_started/middleware/class_based_middleware.py

Hi @eavanvalkenburg , thanks for the suggestion! This approach is not added into middleware because it's slow and consumes a lot of tokens, we would like the AI developers to try it as "optional" instead of in the core path. What do you think?

YusakuNo1 · 2025-11-17T21:36:48Z

Love the idea of this, but could you have a look at our ChatMiddleware and implement using that, that way it becomes a much more native way of working, and if you put the actual middleware function inside of a class then all the setup and best_score etc. can be captured and maintained, this samples shows classes, the same works for ChatMiddleware, but this could also be implemented with AgentMiddleware: https://github.com/microsoft/agent-framework/blob/main/python/samples/getting_started/middleware/class_based_middleware.py

Within the team, we discussed offline for the approach of using middeware. For the use case of middleware, it'll intercept the traffic and do something extra, but for our current use case, the self-reflection code will also modify the original user input, and then run again. With this, maybe it's not a good use case for middleware, plus, this is observability use case, maybe we can make it simple for the user for now. What do you think?

TaoChenOSU

I probably wouldn't put this sample in the observability samples because it does not have anything to do with observability.

Maybe a new evaluation folder?

python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/self_reflection.py

python/samples/getting_started/observability/self_reflection.py

python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/self_reflection.py

YusakuNo1 · 2025-11-18T00:31:08Z

I probably wouldn't put this sample in the observability samples because it does not have anything to do with observability.

Maybe a new evaluation folder?

For organization definition, our "observability" includes both tracing and evaluation... I think we can host the sample in observability for now?

TaoChenOSU · 2025-11-18T00:44:34Z

I probably wouldn't put this sample in the observability samples because it does not have anything to do with observability.
Maybe a new evaluation folder?

For organization definition, our "observability" includes both tracing and evaluation... I think we can host the sample in observability for now?

What do you think @eavanvalkenburg? If we want to keep evaluation under observability, could we change the name of the sample to evaluation_with_self_reflection?

…gent-framework into users/daviwu/self_reflection

python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/README.md

python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/self_reflection.py

python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/README.md

…luation/README.md Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>

…batch` as a library, only use it as sample code

…gent-framework into users/daviwu/self_reflection

eavanvalkenburg

Some small notes, but looks good overall

python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/self_reflection.py

…luation/self_reflection.py Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>

* First working version * Simplify the implementations * Remove unused env var * Update Python syntax * Address feedbacks * Fix a typo * Update names as review suggestions * Citation for self-reflection * Move to independent folder * Update python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/README.md Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com> * Updated from parquet to JSONL and hide the default environment variables * As review feedback, remove the purpose of using `run_self_reflection_batch` as a library, only use it as sample code * Update python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/self_reflection.py Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com> --------- Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>

YusakuNo1 added 3 commits November 15, 2025 00:14

First working version

646485b

Simplify the implementations

bbe6eb4

Merge branch 'main' into users/daviwu/self_reflection

30f08a3

Copilot AI review requested due to automatic review settings November 15, 2025 17:04

markwallace-microsoft added documentation Improvements or additions to documentation python labels Nov 15, 2025

github-actions bot changed the title ~~Use AI Foundry evaluators for self-reflection~~ Python: Use AI Foundry evaluators for self-reflection Nov 15, 2025

Copilot started reviewing on behalf of YusakuNo1 November 15, 2025 17:05 View session

Remove unused env var

03999bc

Copilot finished reviewing on behalf of YusakuNo1 November 15, 2025 17:06

Copilot AI reviewed Nov 15, 2025

View reviewed changes

Update Python syntax

387ba9b

YusakuNo1 enabled auto-merge November 16, 2025 03:11

Address feedbacks

70e25be

YusakuNo1 disabled auto-merge November 16, 2025 04:23

Fix a typo

4a103be

YusakuNo1 enabled auto-merge November 16, 2025 04:38

eavanvalkenburg reviewed Nov 17, 2025

View reviewed changes

Merge branch 'main' into users/daviwu/self_reflection

f605133

crickman added this to Agent Framework Nov 17, 2025

Merge branch 'main' into users/daviwu/self_reflection

2e0b758

TaoChenOSU reviewed Nov 17, 2025

View reviewed changes

python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/self_reflection.py Show resolved Hide resolved

Update names as review suggestions

14ca874

Zyysurely approved these changes Nov 18, 2025

View reviewed changes

Citation for self-reflection

c65edba

YusakuNo1 added 3 commits November 17, 2025 17:23

Merge branch 'users/daviwu/self_reflection' of github.com:microsoft/a…

a0f9479

…gent-framework into users/daviwu/self_reflection

Merge branch 'main' into users/daviwu/self_reflection

a691d8e

Move to independent folder

165fdde

eavanvalkenburg reviewed Nov 18, 2025

View reviewed changes

changliu2 reviewed Nov 18, 2025

View reviewed changes

python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/README.md Show resolved Hide resolved

YusakuNo1 and others added 5 commits November 18, 2025 09:06

Update python/samples/getting_started/evaluation/azure_ai_foundry/eva…

3d8be84

…luation/README.md Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>

Merge branch 'main' into users/daviwu/self_reflection

e189301

Updated from parquet to JSONL and hide the default environment variables

1bde318

As review feedback, remove the purpose of using `run_self_reflection_…

fff8ff0

…batch` as a library, only use it as sample code

Merge branch 'users/daviwu/self_reflection' of github.com:microsoft/a…

d47d336

…gent-framework into users/daviwu/self_reflection

eavanvalkenburg approved these changes Nov 19, 2025

View reviewed changes

YusakuNo1 and others added 2 commits November 19, 2025 08:57

Update python/samples/getting_started/evaluation/azure_ai_foundry/eva…

9f1ffe2

…luation/self_reflection.py Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>

Merge branch 'main' into users/daviwu/self_reflection

04c7bce

Zyysurely approved these changes Nov 19, 2025

View reviewed changes

TaoChenOSU approved these changes Nov 19, 2025

View reviewed changes

YusakuNo1 added this pull request to the merge queue Nov 19, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 19, 2025

YusakuNo1 added this pull request to the merge queue Nov 19, 2025

Merged via the queue into main with commit b3e96b8 Nov 19, 2025
23 checks passed

github-project-automation bot moved this to Done in Agent Framework Nov 19, 2025

YusakuNo1 deleted the users/daviwu/self_reflection branch November 19, 2025 18:47

Conversation

YusakuNo1 commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Description

Contribution Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eavanvalkenburg left a comment

Choose a reason for hiding this comment

Uh oh!

YusakuNo1 commented Nov 17, 2025

Uh oh!

YusakuNo1 commented Nov 17, 2025

Uh oh!

TaoChenOSU left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

YusakuNo1 commented Nov 18, 2025

Uh oh!

TaoChenOSU commented Nov 18, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eavanvalkenburg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

YusakuNo1 commented Nov 15, 2025 •

edited

Loading