Skip to content

FEAT: Update evaluate_scorers#1406

Open
varunj-msft wants to merge 2 commits intoAzure:mainfrom
varunj-msft:varunj-msft/7366-Update-evaluate_scorers
Open

FEAT: Update evaluate_scorers#1406
varunj-msft wants to merge 2 commits intoAzure:mainfrom
varunj-msft:varunj-msft/7366-Update-evaluate_scorers

Conversation

@varunj-msft
Copy link
Contributor

Description

Creates a standardized scorer initialization pattern mirroring the existing AIRTTargetInitializer approach.

  • Created pyrit/setup/initializers/components/ subdirectory
  • Moved airt_targets.pycomponents/targets.py, renamed TargetConfigAIRTTargetConfig
  • Created components/scorers.py with AIRTScorerInitializer and 21 scorer configs
  • Updated __init__.py exports for new module paths
  • Refactored evaluate_scorers.py to use TargetRegistry for base targets and wire in both initializers

Tests and Documentation

  • Added tests/unit/setup/test_airt_scorer_initializer.py
  • Updated tests/unit/setup/test_airt_targets_initializer.py for new import paths

@romanlutz romanlutz changed the title FEAT: Update evaulate_scorers FEAT: Update evaluate_scorers Feb 26, 2026
Copilot AI review requested due to automatic review settings February 27, 2026 00:02
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a standardized “component initializer” pattern for AIRT setup by adding a scorer initializer alongside the existing target initializer approach, and refactors the scorer-evaluation script to rely on registry-registered scorers.

Changes:

  • Added pyrit/setup/initializers/components/ with dedicated target/scorer initializer modules and updated package exports.
  • Introduced AIRTScorerInitializer with a centralized SCORER_CONFIGS list for evaluation scorers.
  • Refactored build_scripts/evaluate_scorers.py to initialize via AIRT initializers and iterate scorers from ScorerRegistry.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/unit/setup/test_airt_targets_initializer.py Updates imports to the new components.targets module path.
tests/unit/setup/test_airt_scorer_initializer.py Adds unit tests for AIRTScorerInitializer behavior and config coverage.
pyrit/setup/initializers/components/targets.py New module defining TARGET_CONFIGS and AIRTTargetInitializer registration logic.
pyrit/setup/initializers/components/scorers.py New module defining SCORER_CONFIGS and AIRTScorerInitializer registration logic.
pyrit/setup/initializers/components/init.py Exposes component initializer types via __all__.
pyrit/setup/initializers/init.py Re-exports AIRTScorerInitializer and updates AIRTTargetInitializer import path.
build_scripts/evaluate_scorers.py Uses AIRTScorerInitializer + ScorerRegistry instead of hand-built scorer instances.
Comments suppressed due to low confidence (1)

build_scripts/evaluate_scorers.py:72

  • Scorer.evaluate_async() defaults to update_registry_behavior=SKIP_IF_EXISTS, so this script may return cached metrics without re-running evaluations (and still prints “Evaluation complete and saved!”). If the intent is to benchmark scorers on each run, pass update_registry_behavior=RegistryUpdateBehavior.ALWAYS_UPDATE (and import the enum) or adjust the status messaging to reflect when cached results were used.
        try:
            print("  Status: Running evaluations...")
            results = await scorer.evaluate_async(
                num_scorer_trials=3,
                max_concurrency=10,
            )

Comment on lines +57 to +77
def _make_gpt4o_target(*, temperature: float | None = None) -> OpenAIChatTarget:
"""
Create an OpenAIChatTarget from AZURE_OPENAI_GPT4O environment variables.

Args:
temperature: Optional temperature override for the target.

Returns:
OpenAIChatTarget: A configured chat target.
"""
kwargs: dict[str, Any] = {
"endpoint": os.environ.get("AZURE_OPENAI_GPT4O_ENDPOINT"),
"api_key": os.environ.get("AZURE_OPENAI_GPT4O_KEY"),
"model_name": os.environ.get("AZURE_OPENAI_GPT4O_MODEL"),
}
underlying = os.environ.get("AZURE_OPENAI_GPT4O_UNDERLYING_MODEL")
if underlying:
kwargs["underlying_model"] = underlying
if temperature is not None:
kwargs["temperature"] = temperature
return OpenAIChatTarget(**kwargs)
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_make_gpt4o_target passes None values through to OpenAIChatTarget, which then raises errors referencing OPENAI_CHAT_* env vars (because those are the target’s defaults). That makes skip/warning messages misleading when the intended configuration is via AZURE_OPENAI_GPT4O_*. Consider validating the required AZURE_OPENAI_GPT4O_* vars up front and raising a ValueError that references the AZURE var names, or using default_values.get_required_value with those env var names before constructing the target.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants