Skip to content

[FE / Feat] Add evaluators to existing evals#4577

Draft
ardaerzin wants to merge 7 commits into
feat/unified-eval-loopsfrom
fe-feat/add-evaluators-to-existing-eval
Draft

[FE / Feat] Add evaluators to existing evals#4577
ardaerzin wants to merge 7 commits into
feat/unified-eval-loopsfrom
fe-feat/add-evaluators-to-existing-eval

Conversation

@ardaerzin
Copy link
Copy Markdown
Contributor

Summary

Testing

Verified locally

Added or updated tests

QA follow-up

Demo

Checklist

  • I have included a video or screen recording for UI changes, or marked Demo as N/A
  • Relevant tests pass locally
  • Relevant linting and formatting pass locally
  • I have signed the CLA, or I will sign it when the bot prompts me

Contributor Resources

ardaerzin added 3 commits June 7, 2026 18:41
- Shared 'Edit evaluation' drawer (name/description + evaluators) opened from a run-header
  actions dropdown (all tabs), the config General 'Edit' button, and the evaluations-table
  row action; the config General section is now display-only.
- Jotai mutation flow (editSimpleEvaluation + process slice) with a terminal-gated
  background refresh so the evaluations list and the run scenarios table converge reliably
  (columns, metric cells, status) after an edit.
- Resolve evaluator output metrics for staged (pending) evaluators in the drawer.
- Dark mode fixes: drawer edge shadow, entity-picker hover/selected highlight, and the
  cascader child-panel loading/loaded width jump.
dispatch_run_slice re-activates the run (status=RUNNING, is_active=True) before dispatching
the worker, so the status indicator reflects the reprocess; _finalize_run_after_slice floors
it back to terminal when scoring completes. Adds an acceptance probe for the edit+process path.
Link ids recovered from stored result cells on the re-run/process path arrive as dashed
UUIDs (live spans send bare hex); both encode the same integer. Strip dashes before base-16
parsing so add_link no longer raises ValueError on the hyphens.
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment Jun 8, 2026 2:25pm

Request Review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 8, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 3611c803-c94c-47c5-b8f0-90668ec0b627

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fe-feat/add-evaluators-to-existing-eval

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Drops the exploratory acceptance probe added alongside the run-status change; it was a
proof-of-contract probe, not a maintained test.
ardaerzin added 2 commits June 8, 2026 16:21
… slice

Mirrors the run-level re-activation at the scenario level so per-scenario status
indicators also reflect the reprocess; dispatch_run_slice now bulk-sets the addressed
scenarios to RUNNING/is_active before dispatch (full-PUT edit preserves flags/interval/
timestamp/meta), and the engine writes each scenario's terminal status back on completion.
…ding an evaluator

The post-edit background refresh now (1) matches any query scoped to the run id
(reload-equivalent — covers the scenario rows+status query the old allowlist missed),
(2) detects run completion authoritatively via the run batcher instead of getQueryData,
and (3) invalidates twice (now + a short settle) so cell results that persist just after
the run status flips terminal aren't left frozen by the per-scenario poller.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants