Conversation
pyrit/datasets/prompt_converters/scientific_obfuscation_converter.yaml
Outdated
Show resolved
Hide resolved
|
|
||
| ## Mode-specific guidelines: | ||
|
|
||
| {% if mode == "academic" %} |
There was a problem hiding this comment.
I am confused. This includes only a specific section depending on the mode BUT at the end there's a combined mode. How will it know all the modes if we exclude most of them? Examples below also include all of them.
There was a problem hiding this comment.
Also not sure if I understand the combined mode. In the class it's explicitly listed as a mode, but here it's a catchall, so "foobar" would resolve to a combined prompt. I feel like it would be better to just drop "combined" and refer to the default/wildcard as a combined mode
There was a problem hiding this comment.
I put the combined to combine a couple of the methods into 1. I did change to an "elif" rather than catch all "else" since any other mode should get caught as an exception!
| Raises: | ||
| ValueError: If an invalid mode is provided. | ||
| """ | ||
| valid_modes = ("academic", "technical", "smiles", "research", "reaction", "combined") |
There was a problem hiding this comment.
is there an easier way to check for this given that it's a literal that's defined above
There was a problem hiding this comment.
I suggested one below by just attaching the valid modes to the class itself, but it's a nit so feel free to disregard
There was a problem hiding this comment.
I suppose you could have both.
from typing import Literal, get_args
ObfuscationMode = Literal[
"academic", "technical", "smiles", "research", "reaction", "combined"
]
OBFUSCATION_MODES = set(get_args(ObfuscationMode))
def is_valid_mode(value: str) -> bool:
return value in OBFUSCATION_MODES
There was a problem hiding this comment.
Pull request overview
Adds a new LLM-based prompt converter that rewrites prompts into “scientific/technical” phrasing across multiple modes, along with the seed prompt template, documentation wiring, and unit tests.
Changes:
- Introduces
ScientificObfuscationConverter(mode-driven) backed by a new YAML seed prompt template. - Exposes the converter via
pyrit.prompt_converterexports and API docs. - Adds unit tests and an example usage snippet in the converters documentation notebook.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
pyrit/prompt_converter/scientific_obfuscation_converter.py |
Implements the new LLM-based converter and identifier construction. |
pyrit/datasets/prompt_converters/scientific_obfuscation_converter.yaml |
Adds the mode-parameterized system prompt template used by the converter. |
pyrit/prompt_converter/__init__.py |
Exports the new converter from the prompt_converter package. |
tests/unit/converter/test_scientific_obfuscation_converter.py |
Adds unit tests validating initialization, mode validation, and conversion behavior. |
doc/code/converters/1_text_to_text_converters.py |
Documents example usage of the new converter in the text-to-text converters notebook source. |
doc/code/converters/1_text_to_text_converters.ipynb |
Adds the corresponding notebook cell content for the new converter example. |
doc/api.rst |
Adds the converter to the API reference list. |
Comments suppressed due to low confidence (2)
pyrit/prompt_converter/scientific_obfuscation_converter.py:23
- The PR title/description refer to a "Scientific Translation Converter", but the implementation and dataset are named "ScientificObfuscationConverter" / "scientific_obfuscation_converter". If this is intended to be a translation-style converter, consider aligning the naming (or update the PR description) to avoid confusion for API consumers and documentation readers.
class ScientificObfuscationConverter(LLMGenericTextConverter):
"""
Uses an LLM to transform simple or direct prompts into
pyrit/prompt_converter/scientific_obfuscation_converter.py:67
valid_modesduplicates the allowed values already defined inObfuscationMode. To avoid the tuple and the type alias drifting out of sync, derive the runtime list from the type (e.g.,typing.get_args(ObfuscationMode)) or centralize the allowed modes as a single constant reused for both validation and typing.
valid_modes = ("academic", "technical", "smiles", "research", "reaction", "combined")
if mode not in valid_modes:
raise ValueError(f"Invalid mode '{mode}'. Must be one of: {valid_modes}")
| Raises: | ||
| ValueError: If an invalid mode is provided. | ||
| """ | ||
| valid_modes = ("academic", "technical", "smiles", "research", "reaction", "combined") |
There was a problem hiding this comment.
I suggested one below by just attaching the valid modes to the class itself, but it's a nit so feel free to disregard
|
|
||
| ## Mode-specific guidelines: | ||
|
|
||
| {% if mode == "academic" %} |
There was a problem hiding this comment.
Also not sure if I understand the combined mode. In the class it's explicitly listed as a mode, but here it's a catchall, so "foobar" would resolve to a combined prompt. I feel like it would be better to just drop "combined" and refer to the default/wildcard as a combined mode
| if any(var not in kwargs for var in for_vars): | ||
| # Don't render if we're missing loop collection variables - preserve the template as-is | ||
| # Extract variable names from {% if var ... %} and {% elif var ... %} patterns | ||
| if_vars = re.findall(r"\{%[-\s]*(?:el)?if\s+(\w+)", self.value) |
There was a problem hiding this comment.
Rather than parsing here and having one yaml, is it possible to have multiple yamls each for the mode and then we can leave the seed.py untouched (I'm assuming this parsing is specific to this converter which I would prefer not to do since seed is a generic data structure) and redirect to the corresponding yaml based on the mode in the scientific_translation_converter.py
|
|
||
| # Load default template if not provided | ||
| prompt_template = ( | ||
| prompt_template |
There was a problem hiding this comment.
prompt_template as an input is weird to me bc we have translation_mode so a user could make a prompt_template that doesn't correspond to a mode at all / the translation_mode is useless if the prompt_template doesn't use it so imo we should enforce the translation_mode in the prompt template or not allow the prompt_template (on the fence on whether the prompt_template is useful; you and Cristian would probably have more insight into that than me)
Description
Adding scientific translation converter to translate queries into various "scientific" modes
Tests and Documentation
Added unit tests and added converter into converters notebook for text->text using LLMs