Skip to content

Comments

Python: Fix Eval samples #4033

Merged
eavanvalkenburg merged 4 commits intomicrosoft:mainfrom
eavanvalkenburg:eval_samples
Feb 18, 2026
Merged

Python: Fix Eval samples #4033
eavanvalkenburg merged 4 commits intomicrosoft:mainfrom
eavanvalkenburg:eval_samples

Conversation

@eavanvalkenburg
Copy link
Member

Motivation and Context

Fixes the samples using evals, both red team and self reflection.

Description

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? If yes, add "[BREAKING]" prefix to the title of the PR.

Copilot AI review requested due to automatic review settings February 18, 2026 12:52
@markwallace-microsoft markwallace-microsoft added documentation Improvements or additions to documentation python labels Feb 18, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes the Python evaluation samples (self_reflection and red_teaming) by updating them to use the correct APIs and dependencies.

Changes:

  • Updates dependency versions in uv.lock (anthropic, github-copilot-sdk, mem0ai, pandas, uv)
  • Migrates self_reflection sample from AzureOpenAIChatClient to AzureOpenAIResponsesClient
  • Updates default models from gpt-4.1 to gpt-5.2
  • Improves file path handling using Path for better cross-platform compatibility
  • Adds async project client support for proper Azure AI Foundry integration
  • Updates red_team_agent_sample callback signature to match the expected interface

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 3 comments.

File Description
python/uv.lock Updates package versions: anthropic (0.80.0→0.81.0), github-copilot-sdk (0.1.24→0.1.25), mem0ai (1.0.3→1.0.4), pandas (3.0.0→3.0.1), uv (0.10.3→0.10.4)
python/samples/05-end-to-end/evaluation/self_reflection/self_reflection.py Migrates from AzureOpenAIChatClient to AzureOpenAIResponsesClient, adds async project client, improves path handling, updates default models
python/samples/05-end-to-end/evaluation/self_reflection/README.md Updates documentation to reflect AzureOpenAIResponsesClient usage and gpt-5.2 models
python/samples/05-end-to-end/evaluation/red_teaming/red_team_agent_sample.py Adds PEP 723 metadata, fixes callback signature to match RedTeam API expectations, improves error handling

@crickman crickman moved this to In Review in Agent Framework Feb 18, 2026
@crickman crickman added the samples Issue relates to the samples label Feb 18, 2026
@eavanvalkenburg eavanvalkenburg marked this pull request as draft February 18, 2026 16:04
@eavanvalkenburg eavanvalkenburg marked this pull request as ready for review February 18, 2026 19:39
@eavanvalkenburg eavanvalkenburg requested a review from a team as a code owner February 18, 2026 19:39
@markwallace-microsoft
Copy link
Member

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/core/agent_framework/openai
   _responses_client.py6247887%291–294, 298–299, 302–303, 309–310, 315, 328–334, 355, 363, 386, 549, 552, 607, 611, 613, 615, 617, 693, 703, 708, 751, 832, 849, 862, 1016, 1021, 1025–1027, 1031–1032, 1055, 1124, 1146–1147, 1162–1163, 1181–1182, 1320–1321, 1337, 1339, 1418–1426, 1545, 1600, 1615, 1654–1655, 1657–1659, 1673–1675, 1685–1686, 1692, 1707
TOTAL21193332884% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
4176 239 💤 0 ❌ 0 🔥 1m 13s ⏱️

@eavanvalkenburg eavanvalkenburg added this pull request to the merge queue Feb 18, 2026
Merged via the queue into microsoft:main with commit aab80d9 Feb 18, 2026
26 checks passed
@github-project-automation github-project-automation bot moved this from In Review to Done in Agent Framework Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation python samples Issue relates to the samples

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants