Open
Conversation
src/orchestrator/batch.ts
Outdated
Comment on lines
159
to
161
| if (questionIds && questionIds.length > 0) { | ||
| targetQuestionIds = questionIds | ||
| logger.info(`Using explicit questionIds: ${questionIds.length} questions`) |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
25e2b45 to
28a1861
Compare
Comment on lines
+358
to
+364
| if (!questionIdValidation || questionIdValidation.invalid.length > 0) { | ||
| setError("Please validate patterns before starting the run") | ||
| return | ||
| } | ||
|
|
||
| // Use the expanded question IDs from validation | ||
| questionIds = questionIdValidation.expanded |
There was a problem hiding this comment.
Bug: Changing the benchmark does not clear the question ID validation state, allowing submission with stale validation data from a different benchmark.
Severity: MEDIUM
Suggested Fix
Add a useEffect hook that listens for changes to form.benchmark. When the benchmark is changed, the effect should clear the questionIdValidation state, forcing the user to re-validate their question IDs against the new benchmark before they can submit the form.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: ui/app/runs/new/page.tsx#L358-L364
Potential issue: When a user validates question IDs for a specific benchmark and then
changes the benchmark without re-validating, the validation state
(`questionIdValidation`) is not cleared. The form allows submission using the stale
validation results from the original benchmark. If any question IDs from the original
benchmark exist in the new benchmark, the backend will silently accept them and execute
the run against an incorrect set of questions, leading to invalid results without user
awareness.
Member
|
I wonder if question ID is the right heuristic to build on especially because we want to make it interoperable between all benchmarks |
Member
|
Can we test this against Locomo and Convoman as well? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

No description provided.