Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
646485b
First working version
YusakuNo1 Nov 15, 2025
bbe6eb4
Simplify the implementations
YusakuNo1 Nov 15, 2025
30f08a3
Merge branch 'main' into users/daviwu/self_reflection
YusakuNo1 Nov 15, 2025
03999bc
Remove unused env var
YusakuNo1 Nov 15, 2025
387ba9b
Update Python syntax
YusakuNo1 Nov 16, 2025
70e25be
Address feedbacks
YusakuNo1 Nov 16, 2025
4a103be
Fix a typo
YusakuNo1 Nov 16, 2025
f605133
Merge branch 'main' into users/daviwu/self_reflection
YusakuNo1 Nov 17, 2025
2e0b758
Merge branch 'main' into users/daviwu/self_reflection
YusakuNo1 Nov 17, 2025
14ca874
Update names as review suggestions
YusakuNo1 Nov 18, 2025
c65edba
Citation for self-reflection
YusakuNo1 Nov 18, 2025
a0f9479
Merge branch 'users/daviwu/self_reflection' of github.com:microsoft/a…
YusakuNo1 Nov 18, 2025
a691d8e
Merge branch 'main' into users/daviwu/self_reflection
YusakuNo1 Nov 18, 2025
165fdde
Move to independent folder
YusakuNo1 Nov 18, 2025
3d8be84
Update python/samples/getting_started/evaluation/azure_ai_foundry/eva…
YusakuNo1 Nov 18, 2025
e189301
Merge branch 'main' into users/daviwu/self_reflection
YusakuNo1 Nov 18, 2025
1bde318
Updated from parquet to JSONL and hide the default environment variables
YusakuNo1 Nov 18, 2025
fff8ff0
As review feedback, remove the purpose of using `run_self_reflection_…
YusakuNo1 Nov 19, 2025
d47d336
Merge branch 'users/daviwu/self_reflection' of github.com:microsoft/a…
YusakuNo1 Nov 19, 2025
9f1ffe2
Update python/samples/getting_started/evaluation/azure_ai_foundry/eva…
YusakuNo1 Nov 19, 2025
04c7bce
Merge branch 'main' into users/daviwu/self_reflection
YusakuNo1 Nov 19, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions python/samples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,7 @@ This directory contains samples demonstrating the capabilities of Microsoft Agen
| File | Description |
|------|-------------|
| [`getting_started/evaluation/azure_ai_foundry/red_team_agent_sample.py`](./getting_started/evaluation/azure_ai_foundry/red_team_agent_sample.py) | Red team agent evaluation sample for Azure AI Foundry |
| [`getting_started/evaluation/azure_ai_foundry/evaluation/self_reflection.py`](./getting_started/evaluation/azure_ai_foundry/evaluation/self_reflection.py) | LLM self-reflection with AI Foundry graders example |

## MCP (Model Context Protocol)

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
AZURE_OPENAI_ENDPOINT="..."
AZURE_OPENAI_API_KEY="..."
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Self-Reflection Evaluation Sample

This sample demonstrates the self-reflection pattern using Agent Framework and Azure AI Foundry's Groundedness Evaluator. For details, see [Reflexion: Language Agents with Verbal Reinforcement Learning](https://arxiv.org/abs/2303.11366) (NeurIPS 2023).

## Overview

**What it demonstrates:**
- Iterative self-reflection loop that automatically improves responses based on groundedness evaluation
- Batch processing of prompts from Parquet files with progress tracking
- Using `AzureOpenAIChatClient` with Azure CLI authentication
- Comprehensive summary statistics and detailed result tracking

## Prerequisites

### Azure Resources
- **Azure OpenAI**: Deploy models (default: gpt-4.1 for both agent and judge)
- **Azure CLI**: Run `az login` to authenticate

### Python Environment
```bash
pip install agent-framework-core azure-ai-evaluation pandas --pre
```

### Environment Variables
```bash
# .env file
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_KEY=your-api-key # Optional with Azure CLI
```

## Running the Sample

```bash
# Basic usage
python self_reflection.py

# With options
python self_reflection.py --input my_prompts.parquet \
--output results.parquet \
--max-reflections 5 \
-n 10
```

**CLI Options:**
- `--input`, `-i`: Input parquet file
- `--output`, `-o`: Output parquet file
- `--agent-model`, `-m`: Agent model name (default: gpt-4.1)
- `--judge-model`, `-e`: Evaluator model name (default: gpt-4.1)
- `--max-reflections`: Max iterations (default: 3)
- `--limit`, `-n`: Process only first N prompts

## Understanding Results

The agent iteratively improves responses:
1. Generate initial response
2. Evaluate groundedness (1-5 scale)
3. If score < 5, provide feedback and retry
4. Stop at max iterations or perfect score (5/5)

**Example output:**
```
[1/31] Processing prompt 0...
Self-reflection iteration 1/3...
Groundedness score: 3/5
Self-reflection iteration 2/3...
Groundedness score: 5/5
✓ Perfect groundedness score achieved!
✓ Completed with score: 5/5 (best at iteration 2/3)
```

## Related Resources

- [Reflexion Paper](https://arxiv.org/abs/2303.11366)
- [Azure AI Evaluation SDK](https://learn.microsoft.com/azure/ai-studio/how-to/develop/evaluate-sdk)
- [Agent Framework](https://github.com/microsoft/agent-framework)
Loading