bd: backup 2026-03-10 11:57

sjarmak · sjarmak · commit 87c277a8a4a6 · 2026-03-10T11:57:53.000Z
diff --git a/.beads/backup/backup_state.json b/.beads/backup/backup_state.json
@@ -1,10 +1,10 @@
 {
-  "last_dolt_commit": "504lv2a152h6ut0g2l7sf1jonrcsckdg",
+  "last_dolt_commit": "75e6jlecsd93v38ji1glvs7rg3ai3dtn",
   "last_event_id": 0,
-  "timestamp": "2026-03-10T11:27:18.174288292Z",
+  "timestamp": "2026-03-10T11:57:53.483867571Z",
   "counts": {
     "issues": 19,
-    "events": 58,
+    "events": 59,
     "comments": 0,
     "dependencies": 10,
     "labels": 0,
diff --git a/.beads/backup/events.jsonl b/.beads/backup/events.jsonl
@@ -56,3 +56,4 @@
 {"actor":"sjarmak","comment":null,"created_at":"2026-03-09T22:07:15Z","event_type":"status_changed","id":56,"issue_id":"CodeScaleBench-ki9","new_value":"{\"status\":\"in_progress\"}","old_value":"{\"id\":\"CodeScaleBench-ki9\",\"title\":\"Fix OpenHands runtime crash on Daytona + investigate false-positive verifiers\",\"description\":\"Two intertwined issues discovered during OpenHands verification batch (runs/staging/openhands_sonnet46_20260309_210054):\\n\\n## Issue 1: OpenHands LocalRuntime crashes on Daytona (ALL tasks)\\n\\nEvery task (17/18 completed) crashes with:\\n```\\ntenacity.RetryError in openhands/runtime/impl/local/local_runtime.py:393 _wait_until_alive\\n```\\nOpenHands v1.4.0 LocalRuntime tries to start jupyter-kernelgateway + action execution server on localhost. It fails to bind/connect inside Daytona sandboxes. The agent never executes any actions.\\n\\nPrevious successful OpenHands runs (686 results in staging) must have used a different config or environment. Need to determine what changed.\\n\\n## Issue 2: Verifiers produce false-positive scores when agent makes no changes\\n\\nelement-web-roomheaderbuttons-can-crash-fix-001 MCP scored 1.0 even though the agent crashed and made ZERO code changes. The verifier ran tests against the unmodified repo and some passed. This is a contract violation — verifiers must detect \\\"no agent output\\\" and score 0.0 before running tests.\\n\\nSimilarly, django-rate-limit-design-001 scored 0.05 on both configs despite the agent never running.\\n\\nTasks affected: all test_ratio and repo_state_heuristic verifiers that don't have a guard check for \\\"did the agent actually produce output.\\\"\",\"status\":\"open\",\"priority\":1,\"issue_type\":\"bug\",\"owner\":\"sjarmak@users.noreply.github.com\",\"created_at\":\"2026-03-09T21:53:24Z\",\"created_by\":\"sjarmak\",\"updated_at\":\"2026-03-09T21:53:24Z\"}"}
 {"actor":"sjarmak","comment":null,"created_at":"2026-03-09T22:16:43Z","event_type":"closed","id":57,"issue_id":"CodeScaleBench-ki9","new_value":"Fixed: OpenHands [core] TOML config + no-changes guard on 317 verifier files","old_value":""}
 {"actor":"sjarmak","comment":null,"created_at":"2026-03-10T11:27:18Z","event_type":"created","id":58,"issue_id":"CodeScaleBench-yb4","new_value":"","old_value":""}
+{"actor":"sjarmak","comment":null,"created_at":"2026-03-10T11:27:26Z","event_type":"closed","id":59,"issue_id":"CodeScaleBench-2kz","new_value":"OH jupyter fix confirmed working: d0fab95 monkey-patches sandbox_plugins as list. Post-fix runs show 0 RetryError, 0 fget, 0 jupyter crashes. Remaining infra issues tracked in yb4.","old_value":""}
diff --git a/.beads/backup/issues.jsonl b/.beads/backup/issues.jsonl