{"actor":"sjarmak","comment":null,"created_at":"2026-03-09T22:07:15Z","event_type":"status_changed","id":56,"issue_id":"CodeScaleBench-ki9","new_value":"{\"status\":\"in_progress\"}","old_value":"{\"id\":\"CodeScaleBench-ki9\",\"title\":\"Fix OpenHands runtime crash on Daytona + investigate false-positive verifiers\",\"description\":\"Two intertwined issues discovered during OpenHands verification batch (runs/staging/openhands_sonnet46_20260309_210054):\\n\\n## Issue 1: OpenHands LocalRuntime crashes on Daytona (ALL tasks)\\n\\nEvery task (17/18 completed) crashes with:\\n```\\ntenacity.RetryError in openhands/runtime/impl/local/local_runtime.py:393 _wait_until_alive\\n```\\nOpenHands v1.4.0 LocalRuntime tries to start jupyter-kernelgateway + action execution server on localhost. It fails to bind/connect inside Daytona sandboxes. The agent never executes any actions.\\n\\nPrevious successful OpenHands runs (686 results in staging) must have used a different config or environment. Need to determine what changed.\\n\\n## Issue 2: Verifiers produce false-positive scores when agent makes no changes\\n\\nelement-web-roomheaderbuttons-can-crash-fix-001 MCP scored 1.0 even though the agent crashed and made ZERO code changes. The verifier ran tests against the unmodified repo and some passed. This is a contract violation — verifiers must detect \\\"no agent output\\\" and score 0.0 before running tests.\\n\\nSimilarly, django-rate-limit-design-001 scored 0.05 on both configs despite the agent never running.\\n\\nTasks affected: all test_ratio and repo_state_heuristic verifiers that don't have a guard check for \\\"did the agent actually produce output.\\\"\",\"status\":\"open\",\"priority\":1,\"issue_type\":\"bug\",\"owner\":\"sjarmak@users.noreply.github.com\",\"created_at\":\"2026-03-09T21:53:24Z\",\"created_by\":\"sjarmak\",\"updated_at\":\"2026-03-09T21:53:24Z\"}"}
0 commit comments