[FE / Feat] Add evaluators to existing evals#4577
Draft
ardaerzin wants to merge 7 commits into
Draft
Conversation
- Shared 'Edit evaluation' drawer (name/description + evaluators) opened from a run-header actions dropdown (all tabs), the config General 'Edit' button, and the evaluations-table row action; the config General section is now display-only. - Jotai mutation flow (editSimpleEvaluation + process slice) with a terminal-gated background refresh so the evaluations list and the run scenarios table converge reliably (columns, metric cells, status) after an edit. - Resolve evaluator output metrics for staged (pending) evaluators in the drawer. - Dark mode fixes: drawer edge shadow, entity-picker hover/selected highlight, and the cascader child-panel loading/loaded width jump.
dispatch_run_slice re-activates the run (status=RUNNING, is_active=True) before dispatching the worker, so the status indicator reflects the reprocess; _finalize_run_after_slice floors it back to terminal when scoring completes. Adds an acceptance probe for the edit+process path.
Link ids recovered from stored result cells on the re-run/process path arrive as dashed UUIDs (live spans send bare hex); both encode the same integer. Strip dashes before base-16 parsing so add_link no longer raises ValueError on the hyphens.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Drops the exploratory acceptance probe added alongside the run-status change; it was a proof-of-contract probe, not a maintained test.
junaway
approved these changes
Jun 8, 2026
… slice Mirrors the run-level re-activation at the scenario level so per-scenario status indicators also reflect the reprocess; dispatch_run_slice now bulk-sets the addressed scenarios to RUNNING/is_active before dispatch (full-PUT edit preserves flags/interval/ timestamp/meta), and the engine writes each scenario's terminal status back on completion.
…ding an evaluator The post-edit background refresh now (1) matches any query scoped to the run id (reload-equivalent — covers the scenario rows+status query the old allowlist missed), (2) detects run completion authoritatively via the run batcher instead of getQueryData, and (3) invalidates twice (now + a short settle) so cell results that persist just after the run status flips terminal aren't left frozen by the per-scenario poller.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Testing
Verified locally
Added or updated tests
QA follow-up
Demo
Checklist
Contributor Resources