Skip to content

FEAT: Scientific Translation Converter#1379

Open
jbolor21 wants to merge 9 commits intoAzure:mainfrom
jbolor21:users/bjagdagdorj/science_converter
Open

FEAT: Scientific Translation Converter#1379
jbolor21 wants to merge 9 commits intoAzure:mainfrom
jbolor21:users/bjagdagdorj/science_converter

Conversation

@jbolor21
Copy link
Contributor

Description

Adding scientific translation converter to translate queries into various "scientific" modes

Tests and Documentation

Added unit tests and added converter into converters notebook for text->text using LLMs

@jbolor21 jbolor21 changed the title [DRAFT]: FEAT: Scientific Translation Converter FEAT: Scientific Translation Converter Feb 19, 2026

## Mode-specific guidelines:

{% if mode == "academic" %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused. This includes only a specific section depending on the mode BUT at the end there's a combined mode. How will it know all the modes if we exclude most of them? Examples below also include all of them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not sure if I understand the combined mode. In the class it's explicitly listed as a mode, but here it's a catchall, so "foobar" would resolve to a combined prompt. I feel like it would be better to just drop "combined" and refer to the default/wildcard as a combined mode

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put the combined to combine a couple of the methods into 1. I did change to an "elif" rather than catch all "else" since any other mode should get caught as an exception!

Raises:
ValueError: If an invalid mode is provided.
"""
valid_modes = ("academic", "technical", "smiles", "research", "reaction", "combined")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there an easier way to check for this given that it's a literal that's defined above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested one below by just attaching the valid modes to the class itself, but it's a nit so feel free to disregard

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you could have both.


from typing import Literal, get_args

ObfuscationMode = Literal[
    "academic", "technical", "smiles", "research", "reaction", "combined"
]

OBFUSCATION_MODES = set(get_args(ObfuscationMode))

def is_valid_mode(value: str) -> bool:
    return value in OBFUSCATION_MODES

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new LLM-based prompt converter that rewrites prompts into “scientific/technical” phrasing across multiple modes, along with the seed prompt template, documentation wiring, and unit tests.

Changes:

  • Introduces ScientificObfuscationConverter (mode-driven) backed by a new YAML seed prompt template.
  • Exposes the converter via pyrit.prompt_converter exports and API docs.
  • Adds unit tests and an example usage snippet in the converters documentation notebook.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pyrit/prompt_converter/scientific_obfuscation_converter.py Implements the new LLM-based converter and identifier construction.
pyrit/datasets/prompt_converters/scientific_obfuscation_converter.yaml Adds the mode-parameterized system prompt template used by the converter.
pyrit/prompt_converter/__init__.py Exports the new converter from the prompt_converter package.
tests/unit/converter/test_scientific_obfuscation_converter.py Adds unit tests validating initialization, mode validation, and conversion behavior.
doc/code/converters/1_text_to_text_converters.py Documents example usage of the new converter in the text-to-text converters notebook source.
doc/code/converters/1_text_to_text_converters.ipynb Adds the corresponding notebook cell content for the new converter example.
doc/api.rst Adds the converter to the API reference list.
Comments suppressed due to low confidence (2)

pyrit/prompt_converter/scientific_obfuscation_converter.py:23

  • The PR title/description refer to a "Scientific Translation Converter", but the implementation and dataset are named "ScientificObfuscationConverter" / "scientific_obfuscation_converter". If this is intended to be a translation-style converter, consider aligning the naming (or update the PR description) to avoid confusion for API consumers and documentation readers.
class ScientificObfuscationConverter(LLMGenericTextConverter):
    """
    Uses an LLM to transform simple or direct prompts into

pyrit/prompt_converter/scientific_obfuscation_converter.py:67

  • valid_modes duplicates the allowed values already defined in ObfuscationMode. To avoid the tuple and the type alias drifting out of sync, derive the runtime list from the type (e.g., typing.get_args(ObfuscationMode)) or centralize the allowed modes as a single constant reused for both validation and typing.
        valid_modes = ("academic", "technical", "smiles", "research", "reaction", "combined")
        if mode not in valid_modes:
            raise ValueError(f"Invalid mode '{mode}'. Must be one of: {valid_modes}")

Raises:
ValueError: If an invalid mode is provided.
"""
valid_modes = ("academic", "technical", "smiles", "research", "reaction", "combined")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested one below by just attaching the valid modes to the class itself, but it's a nit so feel free to disregard


## Mode-specific guidelines:

{% if mode == "academic" %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not sure if I understand the combined mode. In the class it's explicitly listed as a mode, but here it's a catchall, so "foobar" would resolve to a combined prompt. I feel like it would be better to just drop "combined" and refer to the default/wildcard as a combined mode

if any(var not in kwargs for var in for_vars):
# Don't render if we're missing loop collection variables - preserve the template as-is
# Extract variable names from {% if var ... %} and {% elif var ... %} patterns
if_vars = re.findall(r"\{%[-\s]*(?:el)?if\s+(\w+)", self.value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than parsing here and having one yaml, is it possible to have multiple yamls each for the mode and then we can leave the seed.py untouched (I'm assuming this parsing is specific to this converter which I would prefer not to do since seed is a generic data structure) and redirect to the corresponding yaml based on the mode in the scientific_translation_converter.py


# Load default template if not provided
prompt_template = (
prompt_template
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prompt_template as an input is weird to me bc we have translation_mode so a user could make a prompt_template that doesn't correspond to a mode at all / the translation_mode is useless if the prompt_template doesn't use it so imo we should enforce the translation_mode in the prompt template or not allow the prompt_template (on the fence on whether the prompt_template is useful; you and Cristian would probably have more insight into that than me)

@hannahwestra25 hannahwestra25 self-assigned this Feb 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants