Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a standardized “component initializer” pattern for AIRT setup by adding a scorer initializer alongside the existing target initializer approach, and refactors the scorer-evaluation script to rely on registry-registered scorers.
Changes:
- Added
pyrit/setup/initializers/components/with dedicated target/scorer initializer modules and updated package exports. - Introduced
AIRTScorerInitializerwith a centralizedSCORER_CONFIGSlist for evaluation scorers. - Refactored
build_scripts/evaluate_scorers.pyto initialize via AIRT initializers and iterate scorers fromScorerRegistry.
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/setup/test_airt_targets_initializer.py | Updates imports to the new components.targets module path. |
| tests/unit/setup/test_airt_scorer_initializer.py | Adds unit tests for AIRTScorerInitializer behavior and config coverage. |
| pyrit/setup/initializers/components/targets.py | New module defining TARGET_CONFIGS and AIRTTargetInitializer registration logic. |
| pyrit/setup/initializers/components/scorers.py | New module defining SCORER_CONFIGS and AIRTScorerInitializer registration logic. |
| pyrit/setup/initializers/components/init.py | Exposes component initializer types via __all__. |
| pyrit/setup/initializers/init.py | Re-exports AIRTScorerInitializer and updates AIRTTargetInitializer import path. |
| build_scripts/evaluate_scorers.py | Uses AIRTScorerInitializer + ScorerRegistry instead of hand-built scorer instances. |
Comments suppressed due to low confidence (1)
build_scripts/evaluate_scorers.py:72
Scorer.evaluate_async()defaults toupdate_registry_behavior=SKIP_IF_EXISTS, so this script may return cached metrics without re-running evaluations (and still prints “Evaluation complete and saved!”). If the intent is to benchmark scorers on each run, passupdate_registry_behavior=RegistryUpdateBehavior.ALWAYS_UPDATE(and import the enum) or adjust the status messaging to reflect when cached results were used.
try:
print(" Status: Running evaluations...")
results = await scorer.evaluate_async(
num_scorer_trials=3,
max_concurrency=10,
)
| def _make_gpt4o_target(*, temperature: float | None = None) -> OpenAIChatTarget: | ||
| """ | ||
| Create an OpenAIChatTarget from AZURE_OPENAI_GPT4O environment variables. | ||
|
|
||
| Args: | ||
| temperature: Optional temperature override for the target. | ||
|
|
||
| Returns: | ||
| OpenAIChatTarget: A configured chat target. | ||
| """ | ||
| kwargs: dict[str, Any] = { | ||
| "endpoint": os.environ.get("AZURE_OPENAI_GPT4O_ENDPOINT"), | ||
| "api_key": os.environ.get("AZURE_OPENAI_GPT4O_KEY"), | ||
| "model_name": os.environ.get("AZURE_OPENAI_GPT4O_MODEL"), | ||
| } | ||
| underlying = os.environ.get("AZURE_OPENAI_GPT4O_UNDERLYING_MODEL") | ||
| if underlying: | ||
| kwargs["underlying_model"] = underlying | ||
| if temperature is not None: | ||
| kwargs["temperature"] = temperature | ||
| return OpenAIChatTarget(**kwargs) |
There was a problem hiding this comment.
_make_gpt4o_target passes None values through to OpenAIChatTarget, which then raises errors referencing OPENAI_CHAT_* env vars (because those are the target’s defaults). That makes skip/warning messages misleading when the intended configuration is via AZURE_OPENAI_GPT4O_*. Consider validating the required AZURE_OPENAI_GPT4O_* vars up front and raising a ValueError that references the AZURE var names, or using default_values.get_required_value with those env var names before constructing the target.
Description
Creates a standardized scorer initialization pattern mirroring the existing
AIRTTargetInitializerapproach.pyrit/setup/initializers/components/subdirectoryairt_targets.py→components/targets.py, renamedTargetConfig→AIRTTargetConfigcomponents/scorers.pywithAIRTScorerInitializerand 21 scorer configs__init__.pyexports for new module pathsevaluate_scorers.pyto useTargetRegistryfor base targets and wire in both initializersTests and Documentation
tests/unit/setup/test_airt_scorer_initializer.pytests/unit/setup/test_airt_targets_initializer.pyfor new import paths