FEAT: Update evaluate_scorers by varunj-msft · Pull Request #1406 · Azure/PyRIT

varunj-msft · 2026-02-26T18:27:32Z

Description

Creates a standardized scorer initialization pattern mirroring the existing AIRTTargetInitializer approach.

Created pyrit/setup/initializers/components/ subdirectory
Moved airt_targets.py → components/targets.py, renamed TargetConfig → AIRTTargetConfig
Created components/scorers.py with AIRTScorerInitializer and 21 scorer configs
Updated __init__.py exports for new module paths
Refactored evaluate_scorers.py to use TargetRegistry for base targets and wire in both initializers

Tests and Documentation

Added tests/unit/setup/test_airt_scorer_initializer.py
Updated tests/unit/setup/test_airt_targets_initializer.py for new import paths

build_scripts/evaluate_scorers.py

pyrit/setup/initializers/components/scorers.py

pyrit/setup/initializers/components/targets.py

pyrit/setup/initializers/components/scorers.py

Copilot

Pull request overview

This PR introduces a standardized “component initializer” pattern for AIRT setup by adding a scorer initializer alongside the existing target initializer approach, and refactors the scorer-evaluation script to rely on registry-registered scorers.

Changes:

Added pyrit/setup/initializers/components/ with dedicated target/scorer initializer modules and updated package exports.
Introduced AIRTScorerInitializer with a centralized SCORER_CONFIGS list for evaluation scorers.
Refactored build_scripts/evaluate_scorers.py to initialize via AIRT initializers and iterate scorers from ScorerRegistry.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/unit/setup/test_airt_targets_initializer.py	Updates imports to the new `components.targets` module path.
tests/unit/setup/test_airt_scorer_initializer.py	Adds unit tests for `AIRTScorerInitializer` behavior and config coverage.
pyrit/setup/initializers/components/targets.py	New module defining `TARGET_CONFIGS` and `AIRTTargetInitializer` registration logic.
pyrit/setup/initializers/components/scorers.py	New module defining `SCORER_CONFIGS` and `AIRTScorerInitializer` registration logic.
pyrit/setup/initializers/components/init.py	Exposes component initializer types via `__all__`.
pyrit/setup/initializers/init.py	Re-exports `AIRTScorerInitializer` and updates `AIRTTargetInitializer` import path.
build_scripts/evaluate_scorers.py	Uses `AIRTScorerInitializer` + `ScorerRegistry` instead of hand-built scorer instances.

Comments suppressed due to low confidence (1)

build_scripts/evaluate_scorers.py:72

Scorer.evaluate_async() defaults to update_registry_behavior=SKIP_IF_EXISTS, so this script may return cached metrics without re-running evaluations (and still prints “Evaluation complete and saved!”). If the intent is to benchmark scorers on each run, pass update_registry_behavior=RegistryUpdateBehavior.ALWAYS_UPDATE (and import the enum) or adjust the status messaging to reflect when cached results were used.

        try:
            print("  Status: Running evaluations...")
            results = await scorer.evaluate_async(
                num_scorer_trials=3,
                max_concurrency=10,
            )

Copilot · 2026-02-27T00:07:59Z

pyrit/setup/initializers/components/scorers.py

+def _make_gpt4o_target(*, temperature: float | None = None) -> OpenAIChatTarget:
+    """
+    Create an OpenAIChatTarget from AZURE_OPENAI_GPT4O environment variables.
+
+    Args:
+        temperature: Optional temperature override for the target.
+
+    Returns:
+        OpenAIChatTarget: A configured chat target.
+    """
+    kwargs: dict[str, Any] = {
+        "endpoint": os.environ.get("AZURE_OPENAI_GPT4O_ENDPOINT"),
+        "api_key": os.environ.get("AZURE_OPENAI_GPT4O_KEY"),
+        "model_name": os.environ.get("AZURE_OPENAI_GPT4O_MODEL"),
+    }
+    underlying = os.environ.get("AZURE_OPENAI_GPT4O_UNDERLYING_MODEL")
+    if underlying:
+        kwargs["underlying_model"] = underlying
+    if temperature is not None:
+        kwargs["temperature"] = temperature
+    return OpenAIChatTarget(**kwargs)


_make_gpt4o_target passes None values through to OpenAIChatTarget, which then raises errors referencing OPENAI_CHAT_* env vars (because those are the target’s defaults). That makes skip/warning messages misleading when the intended configuration is via AZURE_OPENAI_GPT4O_*. Consider validating the required AZURE_OPENAI_GPT4O_* vars up front and raising a ValueError that references the AZURE var names, or using default_values.get_required_value with those env var names before constructing the target.

Initial commit for update evalute_scorers

1ea26a5

romanlutz changed the title ~~FEAT: Update evaulate_scorers~~ FEAT: Update evaluate_scorers Feb 26, 2026