Skip to content

Overhaul judge and criteria for E2E testing with CLI agent reviewers #4857

Overhaul judge and criteria for E2E testing with CLI agent reviewers

Overhaul judge and criteria for E2E testing with CLI agent reviewers #4857

Triggered via push March 26, 2026 18:56
Status Success
Total duration 10s
Artifacts

evals.yml

on: push
run-evals
6s
run-evals
Fit to window
Zoom out
Zoom in

Annotations

1 warning
run-evals
The process '/usr/bin/git' failed with exit code 128