Skip to content

Run specific Question ID's #26

Open
sreedharsreeram wants to merge 1 commit intomainfrom
02-03_question_id
Open

Run specific Question ID's #26
sreedharsreeram wants to merge 1 commit intomainfrom
02-03_question_id

Conversation

@sreedharsreeram
Copy link
Contributor

No description provided.

Comment on lines 159 to 161
if (questionIds && questionIds.length > 0) {
targetQuestionIds = questionIds
logger.info(`Using explicit questionIds: ${questionIds.length} questions`)

This comment was marked as outdated.

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Comment on lines +358 to +364
if (!questionIdValidation || questionIdValidation.invalid.length > 0) {
setError("Please validate patterns before starting the run")
return
}

// Use the expanded question IDs from validation
questionIds = questionIdValidation.expanded
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Changing the benchmark does not clear the question ID validation state, allowing submission with stale validation data from a different benchmark.
Severity: MEDIUM

Suggested Fix

Add a useEffect hook that listens for changes to form.benchmark. When the benchmark is changed, the effect should clear the questionIdValidation state, forcing the user to re-validate their question IDs against the new benchmark before they can submit the form.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: ui/app/runs/new/page.tsx#L358-L364

Potential issue: When a user validates question IDs for a specific benchmark and then
changes the benchmark without re-validating, the validation state
(`questionIdValidation`) is not cleared. The form allows submission using the stale
validation results from the original benchmark. If any question IDs from the original
benchmark exist in the new benchmark, the backend will silently accept them and execute
the run against an incorrect set of questions, leading to invalid results without user
awareness.

Copy link
Member

Dhravya commented Feb 17, 2026

I wonder if question ID is the right heuristic to build on especially because we want to make it interoperable between all benchmarks

Copy link
Member

Dhravya commented Feb 17, 2026

Can we test this against Locomo and Convoman as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments