Run specific Question ID's by sreedharsreeram · Pull Request #26 · supermemoryai/memorybench

sreedharsreeram · 2026-02-17T03:32:32Z

No description provided.

src/orchestrator/batch.ts

+    if (questionIds && questionIds.length > 0) {
+      targetQuestionIds = questionIds
+      logger.info(`Using explicit questionIds: ${questionIds.length} questions`)


sreedharsreeram · 2026-02-17T03:41:14Z

Run specific Question ID's #26 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

sentry · 2026-02-17T04:16:56Z

ui/app/runs/new/page.tsx

+        if (!questionIdValidation || questionIdValidation.invalid.length > 0) {
+          setError("Please validate patterns before starting the run")
+          return
+        }
+
+        // Use the expanded question IDs from validation
+        questionIds = questionIdValidation.expanded


Bug: Changing the benchmark does not clear the question ID validation state, allowing submission with stale validation data from a different benchmark.
_{Severity: MEDIUM}

Suggested Fix

Add a useEffect hook that listens for changes to form.benchmark. When the benchmark is changed, the effect should clear the questionIdValidation state, forcing the user to re-validate their question IDs against the new benchmark before they can submit the form.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: ui/app/runs/new/page.tsx#L358-L364 Potential issue: When a user validates question IDs for a specific benchmark and then changes the benchmark without re-validating, the validation state (`questionIdValidation`) is not cleared. The form allows submission using the stale validation results from the original benchmark. If any question IDs from the original benchmark exist in the new benchmark, the backend will silently accept them and execute the run against an incorrect set of questions, leading to invalid results without user awareness.

Dhravya · 2026-02-17T18:42:07Z

I wonder if question ID is the right heuristic to build on especially because we want to make it interoperable between all benchmarks

Dhravya · 2026-02-17T18:42:17Z

Can we test this against Locomo and Convoman as well?

sentry bot reviewed Feb 17, 2026

View reviewed changes

src/orchestrator/batch.ts Outdated

Comment on lines 159 to 161

if (questionIds && questionIds.length > 0) {

targetQuestionIds = questionIds

logger.info(`Using explicit questionIds: ${questionIds.length} questions`)

This comment was marked as outdated.

Sign in to view

added question id

28a1861

sreedharsreeram force-pushed the 02-03_question_id branch from 25e2b45 to 28a1861 Compare February 17, 2026 04:13

sentry bot reviewed Feb 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run specific Question ID's #26

Run specific Question ID's #26
sreedharsreeram wants to merge 1 commit intomainfrom
02-03_question_id

sreedharsreeram commented Feb 17, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

sreedharsreeram commented Feb 17, 2026

Uh oh!

sentry bot Feb 17, 2026

Uh oh!

Dhravya commented Feb 17, 2026

Uh oh!

Dhravya commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

sreedharsreeram commented Feb 17, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

sreedharsreeram commented Feb 17, 2026

Uh oh!

sentry bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Dhravya commented Feb 17, 2026

Uh oh!

Dhravya commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments