Commit dd4d62e
Add calibrated curator ground truth (311/367) and harden Daytona sandbox lifecycle
Re-curated ground truth using Opus 4.6 + phase1 prompt + hybrid backend:
- SDLC: 104/160 tasks with ground_truth_agent.json (56 remaining, rate-limited)
- Org: 207/207 tasks with oracle_answer_agent.json (complete)
Curator runner hardening:
- Add cleanup_orphaned_sandboxes() at startup/shutdown to reclaim CPU quota
- Set auto_stop_interval=20, auto_archive_interval=60 to prevent sandbox leaks
- Add SIGTERM/SIGINT handler for graceful shutdown with sandbox cleanup
- Bump DEFAULT_PARALLEL from 20 to 55 (matches Tier 3 capacity)
- Add --overwrite-existing and --skip-agent-variants flags
- Add Strategy 5 (repo_fixture.local_checkout_repos) for Org task repo resolution
- Add signal.alarm(840) OS-level timeout in embedded curator runner
- Add onboarding-search task skip filter
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 60ea357 commit dd4d62e
File tree
779 files changed
+57429
-216
lines changed- benchmarks
- csb_org_compliance
- ccx-compliance-052/tests
- ccx-compliance-053/tests
- ccx-compliance-115/tests
- ccx-compliance-118/tests
- ccx-compliance-124/tests
- ccx-compliance-182/tests
- ccx-compliance-183/tests
- ccx-compliance-184/tests
- ccx-compliance-185/tests
- ccx-compliance-186/tests
- ccx-compliance-187/tests
- ccx-compliance-189/tests
- ccx-compliance-190/tests
- ccx-compliance-191/tests
- ccx-compliance-192/tests
- ccx-compliance-193/tests
- ccx-compliance-194/tests
- csb_org_crossorg
- ccx-crossorg-062/tests
- ccx-crossorg-121/tests
- ccx-crossorg-132/tests
- ccx-crossorg-208/tests
- ccx-crossorg-209/tests
- ccx-crossorg-211/tests
- ccx-crossorg-213/tests
- ccx-crossorg-214/tests
- ccx-crossorg-216/tests
- ccx-crossorg-217/tests
- ccx-crossorg-218/tests
- ccx-crossorg-219/tests
- ccx-crossorg-220/tests
- ccx-crossorg-221/tests
- ccx-crossorg-222/tests
- ccx-crossorg-280/tests
- csb_org_crossrepo_tracing
- ccx-config-trace-010/tests
- ccx-dep-trace-001/tests
- ccx-dep-trace-002/tests
- ccx-dep-trace-004/tests
- ccx-dep-trace-102/tests
- ccx-dep-trace-123/tests
- ccx-dep-trace-133/tests
- ccx-dep-trace-171/tests
- ccx-dep-trace-172/tests
- ccx-dep-trace-173/tests
- ccx-dep-trace-174/tests
- ccx-dep-trace-175/tests
- ccx-dep-trace-177/tests
- ccx-dep-trace-178/tests
- ccx-dep-trace-179/tests
- ccx-dep-trace-180/tests
- ccx-dep-trace-181/tests
- ccx-dep-trace-272/tests
- ccx-dep-trace-273/tests
- csb_org_crossrepo
- ccx-dep-trace-106/tests
- ccx-dep-trace-253/tests
- ccx-dep-trace-254/tests
- ccx-dep-trace-258/tests
- ccx-dep-trace-260/tests
- ccx-dep-trace-261/tests
- ccx-dep-trace-262/tests
- ccx-dep-trace-263/tests
- ccx-dep-trace-264/tests
- ccx-dep-trace-265/tests
- ccx-dep-trace-266/tests
- ccx-dep-trace-267/tests
- ccx-dep-trace-268/tests
- ccx-dep-trace-271/tests
- csb_org_domain
- ccx-domain-072/tests
- ccx-domain-073/tests
- ccx-domain-074/tests
- ccx-domain-101/tests
- ccx-domain-112/tests
- ccx-domain-120/tests
- ccx-domain-129/tests
- ccx-domain-137/tests
- ccx-domain-140/tests
- ccx-domain-151/tests
- ccx-domain-152/tests
- ccx-domain-153/tests
- ccx-domain-154/tests
- ccx-domain-155/tests
- ccx-domain-156/tests
- ccx-domain-157/tests
- ccx-domain-158/tests
- ccx-domain-159/tests
- ccx-domain-160/tests
- csb_org_incident
- ccx-incident-032/tests
- ccx-incident-033/tests
- ccx-incident-034/tests
- ccx-incident-037/tests
- ccx-incident-108/tests
- ccx-incident-110/tests
- ccx-incident-113/tests
- ccx-incident-125/tests
- ccx-incident-131/tests
- ccx-incident-139/tests
- ccx-incident-142/tests
- ccx-incident-143/tests
- ccx-incident-144/tests
- ccx-incident-145/tests
- ccx-incident-146/tests
- ccx-incident-147/tests
- ccx-incident-148/tests
- ccx-incident-149/tests
- ccx-incident-150/tests
- csb_org_migration
- ccx-migration-025/tests
- ccx-migration-026/tests
- ccx-migration-027/tests
- ccx-migration-107/tests
- ccx-migration-114/tests
- ccx-migration-117/tests
- ccx-migration-195/tests
- ccx-migration-196/tests
- ccx-migration-197/tests
- ccx-migration-198/tests
- ccx-migration-199/tests
- ccx-migration-200/tests
- ccx-migration-201/tests
- ccx-migration-202/tests
- ccx-migration-203/tests
- ccx-migration-204/tests
- ccx-migration-205/tests
- ccx-migration-206/tests
- ccx-migration-207/tests
- ccx-migration-274/tests
- ccx-migration-275/tests
- ccx-migration-276/tests
- ccx-migration-277/tests
- ccx-migration-278/tests
- ccx-migration-279/tests
- csb_org_onboarding
- ccx-explore-042-ds/tests
- ccx-onboard-041/tests
- ccx-onboard-042/tests
- ccx-onboard-043/tests
- ccx-onboard-044/tests
- ccx-onboard-050-ds/tests
- ccx-onboard-050/tests
- ccx-onboard-103/tests
- ccx-onboard-109/tests
- ccx-onboard-128/tests
- ccx-onboard-134/tests
- ccx-onboard-136/tests
- ccx-onboard-138/tests
- ccx-onboard-280/tests
- csb_org_org
- ccx-agentic-081/tests
- ccx-agentic-082/tests
- ccx-agentic-083/tests
- ccx-agentic-122/tests
- ccx-agentic-127/tests
- ccx-agentic-223/tests
- ccx-agentic-224/tests
- ccx-agentic-225/tests
- ccx-agentic-229/tests
- ccx-agentic-232/tests
- ccx-agentic-233/tests
- ccx-agentic-234/tests
- ccx-agentic-235/tests
- ccx-agentic-236/tests
- ccx-agentic-237/tests
- csb_org_platform
- ccx-platform-091/tests
- ccx-platform-094/tests
- ccx-platform-100/tests
- ccx-platform-104/tests
- ccx-platform-119/tests
- ccx-platform-238/tests
- ccx-platform-239/tests
- ccx-platform-240/tests
- ccx-platform-241/tests
- ccx-platform-242/tests
- ccx-platform-243/tests
- ccx-platform-244/tests
- ccx-platform-245/tests
- ccx-platform-246/tests
- ccx-platform-248/tests
- ccx-platform-249/tests
- ccx-platform-250/tests
- ccx-platform-251/tests
- csb_org_security
- ccx-vuln-remed-012/tests
- ccx-vuln-remed-013/tests
- ccx-vuln-remed-014/tests
- ccx-vuln-remed-105/tests
- ccx-vuln-remed-111/tests
- ccx-vuln-remed-126/tests
- ccx-vuln-remed-130/tests
- ccx-vuln-remed-135/tests
- ccx-vuln-remed-141/tests
- ccx-vuln-remed-161/tests
- ccx-vuln-remed-162/tests
- ccx-vuln-remed-163/tests
- ccx-vuln-remed-164/tests
- ccx-vuln-remed-165/tests
- ccx-vuln-remed-166/tests
- ccx-vuln-remed-167/tests
- ccx-vuln-remed-168/tests
- ccx-vuln-remed-169/tests
- ccx-vuln-remed-170/tests
- ccx-vuln-remed-281/tests
- ccx-vuln-remed-282/tests
- ccx-vuln-remed-283/tests
- ccx-vuln-remed-284/tests
- csb_sdlc_debug
- envoy-duplicate-headers-debug-001/tests
- grafana-table-panel-regression-001/tests
- istio-xds-destrul-debug-001/tests
- prometheus-queue-reshard-debug-001/tests
- qutebrowser-adblock-cache-regression-prove-001/tests
- qutebrowser-hsv-color-regression-prove-001/tests
- terraform-phantom-update-debug-001/tests
- vuls-oval-regression-prove-001/tests
- csb_sdlc_design
- camel-routing-arch-001/tests
- django-orm-query-arch-001/tests
- django-pre-validate-signal-design-001/tests
- django-rate-limit-design-001/tests
- envoy-routeconfig-dep-chain-001/tests
- envoy-stream-aggregated-sym-001/tests
- etcd-grpc-api-upgrade-001/tests
- flink-checkpoint-arch-001/tests
- flipt-protobuf-metadata-design-001/tests
- flipt-transitive-deps-001/tests
- k8s-crd-lifecycle-arch-001/tests
- k8s-typemeta-dep-chain-001/tests
- postgres-query-exec-arch-001/tests
- csb_sdlc_document
- docgen-changelog-002/tests
- docgen-inline-002/tests
- docgen-runbook-001/tests
- envoy-arch-doc-gen-001/tests
- envoy-migration-doc-gen-001/tests
- k8s-clientgo-doc-gen-001/tests
- k8s-fairqueuing-doc-gen-001/tests
- k8s-kubelet-cm-doc-gen-001/tests
- kafka-api-doc-gen-001/tests
- csb_sdlc_feature
- camel-fix-protocol-feat-001/tests
- cilium-policy-audit-logger-feat-001/tests
- cilium-policy-quota-feat-001/tests
- curl-http3-priority-feat-001/tests
- django-rate-limit-middleware-feat-001/tests
- envoy-custom-header-filter-feat-001/tests
- envoy-grpc-server-impl-001/tests
- flink-pricing-window-feat-001/tests
- k8s-noschedule-taint-feat-001/tests
- k8s-runtime-object-impl-001/tests
- numpy-rolling-median-feat-001/tests
- pandas-merge-asof-indicator-feat-001/tests
- postgres-copy-csv-header-feat-001/tests
- prometheus-silence-bulk-api-feat-001/tests
- pytorch-gradient-noise-feat-001/tests
- servo-css-container-query-feat-001/tests
- terraform-compact-diff-fmt-feat-001/tests
- vscode-custom-fold-region-feat-001/tests
- vscode-stale-diagnostics-feat-001/tests
- csb_sdlc_fix
- django-select-for-update-fix-001/tests
- element-web-roomheaderbuttons-can-crash-fix-001/tests
- flipt-otlp-exporter-fix-001/tests
- flipt-trace-sampling-fix-001/tests
- k8s-dra-scheduler-event-fix-001/tests
- nodebb-notif-dropdown-fix-001/tests
- nodebb-plugin-validate-fix-001/tests
- openlibrary-fntocli-adapter-fix-001/tests
- openlibrary-search-query-fix-001/tests
- openlibrary-solr-boolean-fix-001/tests
- pytorch-cudnn-version-fix-001/tests
- pytorch-dynamo-keyerror-fix-001/tests
- pytorch-release-210-fix-001/tests
- pytorch-relu-gelu-fusion-fix-001/tests
- pytorch-tracer-graph-cleanup-fix-001/tests
- teleport-users-can-delete-fix-001/tests
- terraform-plan-null-unknown-fix-001/tests
- webclients-api-error-metrics-fix-001/tests
- webclients-excessive-repeated-api-fix-001/tests
- webclients-implement-proper-punycode-fix-001/tests
- webclients-incorrect-rendering-content-fix-001/tests
- csb_sdlc_refactor
- cilium-endpoint-manager-refac-001/tests
- django-request-factory-refac-001/tests
- envoy-listener-manager-refac-001/tests
- flipt-dep-refactor-001/tests
- flipt-flagexists-refactor-001/tests
- istio-discovery-server-refac-001/tests
- k8s-score-normalizer-refac-001/tests
- kafka-batch-accumulator-refac-001/tests
- kubernetes-scheduler-profile-refac-001/tests
- numpy-array-dispatch-refac-001/tests
- pandas-index-engine-refac-001/tests
- prometheus-query-engine-refac-001/tests
- strata-fx-european-refac-001/tests
- terraform-eval-context-refac-001/tests
- csb_sdlc_secure
- curl-cve-triage-001/tests
- curl-vuln-reachability-001/tests
- django-legacy-dep-vuln-001/tests
- django-role-based-access-001/tests
- flipt-degraded-context-fix-001/tests
- flipt-repo-scoped-access-001/tests
- k8s-rbac-auth-audit-001/tests
- csb_sdlc_test
- openhands-search-file-test-001/tests
- test-unitgen-py-001/tests
- scripts
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
779 files changed
+57429
-216
lines changedLines changed: 233 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
Lines changed: 18 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
0 commit comments