Python: Fix Eval samples by eavanvalkenburg · Pull Request #4033 · microsoft/agent-framework

eavanvalkenburg · 2026-02-18T12:52:18Z

Motivation and Context

Fixes the samples using evals, both red team and self reflection.

Description

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the Contribution Guidelines
All unit tests pass, and I have added new tests where possible
Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Copilot

Pull request overview

This PR fixes the Python evaluation samples (self_reflection and red_teaming) by updating them to use the correct APIs and dependencies.

Changes:

Updates dependency versions in uv.lock (anthropic, github-copilot-sdk, mem0ai, pandas, uv)
Migrates self_reflection sample from AzureOpenAIChatClient to AzureOpenAIResponsesClient
Updates default models from gpt-4.1 to gpt-5.2
Improves file path handling using Path for better cross-platform compatibility
Adds async project client support for proper Azure AI Foundry integration
Updates red_team_agent_sample callback signature to match the expected interface

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 3 comments.

File	Description
python/uv.lock	Updates package versions: anthropic (0.80.0→0.81.0), github-copilot-sdk (0.1.24→0.1.25), mem0ai (1.0.3→1.0.4), pandas (3.0.0→3.0.1), uv (0.10.3→0.10.4)
python/samples/05-end-to-end/evaluation/self_reflection/self_reflection.py	Migrates from AzureOpenAIChatClient to AzureOpenAIResponsesClient, adds async project client, improves path handling, updates default models
python/samples/05-end-to-end/evaluation/self_reflection/README.md	Updates documentation to reflect AzureOpenAIResponsesClient usage and gpt-5.2 models
python/samples/05-end-to-end/evaluation/red_teaming/red_team_agent_sample.py	Adds PEP 723 metadata, fixes callback signature to match RedTeam API expectations, improves error handling

python/samples/05-end-to-end/evaluation/self_reflection/self_reflection.py

markwallace-microsoft · 2026-02-18T19:45:15Z

Python Test Coverage Report •

File	Stmts	Miss	Cover	Missing
packages/core/agent_framework/openai
_responses_client.py	624	78	87%	291–294, 298–299, 302–303, 309–310, 315, 328–334, 355, 363, 386, 549, 552, 607, 611, 613, 615, 617, 693, 703, 708, 751, 832, 849, 862, 1016, 1021, 1025–1027, 1031–1032, 1055, 1124, 1146–1147, 1162–1163, 1181–1182, 1320–1321, 1337, 1339, 1418–1426, 1545, 1600, 1615, 1654–1655, 1657–1659, 1673–1675, 1685–1686, 1692, 1707
TOTAL	21193	3328	84%

Python Unit Test Overview

Tests	Skipped	Failures	Errors	Time
4176	239 💤	0 ❌	0 🔥	1m 13s ⏱️

Copilot AI review requested due to automatic review settings February 18, 2026 12:52

markwallace-microsoft added documentation Improvements or additions to documentation python labels Feb 18, 2026

Copilot started reviewing on behalf of eavanvalkenburg February 18, 2026 12:52 View session

Copilot AI reviewed Feb 18, 2026

View reviewed changes

victordibia approved these changes Feb 18, 2026

View reviewed changes

markwallace-microsoft approved these changes Feb 18, 2026

View reviewed changes

crickman added this to Agent Framework Feb 18, 2026

crickman moved this to In Review in Agent Framework Feb 18, 2026

crickman assigned eavanvalkenburg Feb 18, 2026

crickman mentioned this pull request Feb 18, 2026

Improve samples and getting started experience #2515

Open

crickman added the samples Issue relates to the samples label Feb 18, 2026

eavanvalkenburg marked this pull request as draft February 18, 2026 16:04

eavanvalkenburg added 3 commits February 18, 2026 20:38

fix red team sample

73d0883

Updated self-reflection

cfa3a54

fix for workflow eval sample

805c3d4

eavanvalkenburg force-pushed the eval_samples branch from ccb181a to 805c3d4 Compare February 18, 2026 19:38

eavanvalkenburg marked this pull request as ready for review February 18, 2026 19:39

eavanvalkenburg requested a review from a team as a code owner February 18, 2026 19:39

fix test

fd85c0f

eavanvalkenburg enabled auto-merge February 18, 2026 19:47

giles17 approved these changes Feb 18, 2026

View reviewed changes

eavanvalkenburg added this pull request to the merge queue Feb 18, 2026

Merged via the queue into microsoft:main with commit aab80d9 Feb 18, 2026
26 checks passed

github-project-automation bot moved this from In Review to Done in Agent Framework Feb 18, 2026

Copilot AI mentioned this pull request Feb 18, 2026

Python: Add load_dotenv() to samples for .env file support #4043

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Python: Fix Eval samples #4033

Python: Fix Eval samples #4033
eavanvalkenburg merged 4 commits intomicrosoft:mainfrom
eavanvalkenburg:eval_samples

eavanvalkenburg commented Feb 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markwallace-microsoft commented Feb 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Comments

Conversation

eavanvalkenburg commented Feb 18, 2026

Motivation and Context

Description

Contribution Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markwallace-microsoft commented Feb 18, 2026

Python Unit Test Overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants