Python: Use AI Foundry evaluators for self-reflection#2250
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds a new Python sample demonstrating self-reflection capabilities for LLM responses using AI Foundry's groundedness evaluators. The sample shows how to iteratively improve LLM responses by evaluating them and providing feedback for refinement.
- New self-reflection sample using groundedness evaluation
- Batch processing capability for evaluating multiple prompts
- Integration with Azure OpenAI and AI Foundry evaluators
Reviewed Changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
python/samples/getting_started/observability/self_reflection.py |
New sample implementing self-reflection loop with groundedness evaluation for LLM responses |
python/samples/getting_started/observability/resources/suboptimal_groundedness_prompts.parquet |
Test data file containing prompts for self-reflection evaluation |
python/samples/getting_started/observability/.env.example |
Added Azure OpenAI configuration variables for the self-reflection sample |
python/samples/README.md |
Added reference to the new self-reflection sample |
Comments suppressed due to low confidence (2)
python/samples/getting_started/observability/.env.example:21
- The model name "gpt-4.1" in the .env.example file doesn't appear to be a valid Azure OpenAI model deployment name. This should be updated to a valid model name like "gpt-4", "gpt-4o", or "gpt-4-turbo".
python/samples/getting_started/observability/self_reflection.py:161 - This statement is unreachable.
best_response = raw_response.choices[0].message.content
python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/self_reflection.py
Show resolved
Hide resolved
python/samples/getting_started/observability/self_reflection.py
Outdated
Show resolved
Hide resolved
python/samples/getting_started/observability/self_reflection.py
Outdated
Show resolved
Hide resolved
python/samples/getting_started/observability/self_reflection.py
Outdated
Show resolved
Hide resolved
python/samples/getting_started/observability/self_reflection.py
Outdated
Show resolved
Hide resolved
python/samples/getting_started/observability/self_reflection.py
Outdated
Show resolved
Hide resolved
python/samples/getting_started/observability/self_reflection.py
Outdated
Show resolved
Hide resolved
eavanvalkenburg
left a comment
There was a problem hiding this comment.
Love the idea of this, but could you have a look at our ChatMiddleware and implement using that, that way it becomes a much more native way of working, and if you put the actual middleware function inside of a class then all the setup and best_score etc. can be captured and maintained, this samples shows classes, the same works for ChatMiddleware, but this could also be implemented with AgentMiddleware: https://github.com/microsoft/agent-framework/blob/main/python/samples/getting_started/middleware/class_based_middleware.py
Hi @eavanvalkenburg , thanks for the suggestion! This approach is not added into middleware because it's slow and consumes a lot of tokens, we would like the AI developers to try it as "optional" instead of in the core path. What do you think? |
Within the team, we discussed offline for the approach of using middeware. For the use case of middleware, it'll intercept the traffic and do something extra, but for our current use case, the self-reflection code will also modify the original user input, and then run again. With this, maybe it's not a good use case for middleware, plus, this is observability use case, maybe we can make it simple for the user for now. What do you think? |
TaoChenOSU
left a comment
There was a problem hiding this comment.
I probably wouldn't put this sample in the observability samples because it does not have anything to do with observability.
Maybe a new evaluation folder?
python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/self_reflection.py
Show resolved
Hide resolved
python/samples/getting_started/observability/self_reflection.py
Outdated
Show resolved
Hide resolved
python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/self_reflection.py
Show resolved
Hide resolved
python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/self_reflection.py
Show resolved
Hide resolved
For organization definition, our "observability" includes both tracing and evaluation... I think we can host the sample in observability for now? |
What do you think @eavanvalkenburg? If we want to keep evaluation under observability, could we change the name of the sample to |
…gent-framework into users/daviwu/self_reflection
python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/README.md
Outdated
Show resolved
Hide resolved
python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/self_reflection.py
Outdated
Show resolved
Hide resolved
python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/self_reflection.py
Outdated
Show resolved
Hide resolved
python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/self_reflection.py
Show resolved
Hide resolved
python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/self_reflection.py
Outdated
Show resolved
Hide resolved
python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/README.md
Show resolved
Hide resolved
…luation/README.md Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>
…batch` as a library, only use it as sample code
…gent-framework into users/daviwu/self_reflection
eavanvalkenburg
left a comment
There was a problem hiding this comment.
Some small notes, but looks good overall
python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/self_reflection.py
Outdated
Show resolved
Hide resolved
python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/self_reflection.py
Show resolved
Hide resolved
python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/self_reflection.py
Show resolved
Hide resolved
…luation/self_reflection.py Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>
* First working version * Simplify the implementations * Remove unused env var * Update Python syntax * Address feedbacks * Fix a typo * Update names as review suggestions * Citation for self-reflection * Move to independent folder * Update python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/README.md Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com> * Updated from parquet to JSONL and hide the default environment variables * As review feedback, remove the purpose of using `run_self_reflection_batch` as a library, only use it as sample code * Update python/samples/getting_started/evaluation/azure_ai_foundry/evaluation/self_reflection.py Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com> --------- Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>
Motivation and Context
Description
Demostrate how to implement self-reflection from agent framework
Contribution Checklist