Skip to content

Commit f11742f

Browse files
sjarmakclaude
andcommitted
feat: add OpenHands harness support to metrics extractor
- Detect harness type (OpenHands vs Claude Code) from agent output files and result.json config.agent.import_path - Extract tokens from trajectory.json final_metrics (OpenHands) when claude-code.txt transcript is unavailable - Add OpenHands tool names to _LOCAL_TOOLS (execute_bash, str_replace_editor, read_file, think, finish, task_tracker) - Add OpenHands MCP tool names to _OPENHANDS_MCP_TOOLS and _SEARCH_TOOL_MAP (keyword_search, list_repos, etc.) - Handle str_replace_editor in code changes extraction (create, str_replace, insert commands) - Add agent/openhands.txt to transcript candidate paths Tested: 24/25 OpenHands staging tasks now extract full metrics (tool_calls, tokens, code changes, search patterns). Only failure is the envoy task where agent never ran (expected). Claude Code extraction unchanged (regression tested). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent d0afa20 commit f11742f

File tree

18 files changed

+189
-7
lines changed

18 files changed

+189
-7
lines changed

benchmarks/csb_org_onboarding/ccx-onboard-search-201/tests/test.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
11
#!/bin/bash
2+
3+
# sg_only_env: restore full repo before verification (no-op for regular runs)
4+
[ -f /tmp/.sg_only_mode ] && [ -f /tests/sgonly_verifier_wrapper.sh ] && source /tests/sgonly_verifier_wrapper.sh
5+
26
set -eo pipefail
37
# RepoQA SR-QA Verification Script
48

benchmarks/csb_org_onboarding/ccx-onboard-search-202/tests/test.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
11
#!/bin/bash
2+
3+
# sg_only_env: restore full repo before verification (no-op for regular runs)
4+
[ -f /tmp/.sg_only_mode ] && [ -f /tests/sgonly_verifier_wrapper.sh ] && source /tests/sgonly_verifier_wrapper.sh
5+
26
set -eo pipefail
37
# RepoQA SR-QA Verification Script
48

benchmarks/csb_org_onboarding/ccx-onboard-search-203/tests/test.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
11
#!/bin/bash
2+
3+
# sg_only_env: restore full repo before verification (no-op for regular runs)
4+
[ -f /tmp/.sg_only_mode ] && [ -f /tests/sgonly_verifier_wrapper.sh ] && source /tests/sgonly_verifier_wrapper.sh
5+
26
set -eo pipefail
37
# RepoQA SR-QA Verification Script
48

benchmarks/csb_org_onboarding/ccx-onboard-search-204/tests/test.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
11
#!/bin/bash
2+
3+
# sg_only_env: restore full repo before verification (no-op for regular runs)
4+
[ -f /tmp/.sg_only_mode ] && [ -f /tests/sgonly_verifier_wrapper.sh ] && source /tests/sgonly_verifier_wrapper.sh
5+
26
set -eo pipefail
37
# RepoQA SR-QA Verification Script
48

benchmarks/csb_org_onboarding/ccx-onboard-search-205/tests/test.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
11
#!/bin/bash
2+
3+
# sg_only_env: restore full repo before verification (no-op for regular runs)
4+
[ -f /tmp/.sg_only_mode ] && [ -f /tests/sgonly_verifier_wrapper.sh ] && source /tests/sgonly_verifier_wrapper.sh
5+
26
set -eo pipefail
37
# RepoQA SR-QA Verification Script
48

benchmarks/csb_org_onboarding/ccx-onboard-search-206/tests/test.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
11
#!/bin/bash
2+
3+
# sg_only_env: restore full repo before verification (no-op for regular runs)
4+
[ -f /tmp/.sg_only_mode ] && [ -f /tests/sgonly_verifier_wrapper.sh ] && source /tests/sgonly_verifier_wrapper.sh
5+
26
set -eo pipefail
37
# RepoQA SR-QA Verification Script
48

benchmarks/csb_org_onboarding/ccx-onboard-search-207/tests/test.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
11
#!/bin/bash
2+
3+
# sg_only_env: restore full repo before verification (no-op for regular runs)
4+
[ -f /tmp/.sg_only_mode ] && [ -f /tests/sgonly_verifier_wrapper.sh ] && source /tests/sgonly_verifier_wrapper.sh
5+
26
set -eo pipefail
37
# RepoQA SR-QA Verification Script
48

benchmarks/csb_org_onboarding/ccx-onboard-search-208/tests/test.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
11
#!/bin/bash
2+
3+
# sg_only_env: restore full repo before verification (no-op for regular runs)
4+
[ -f /tmp/.sg_only_mode ] && [ -f /tests/sgonly_verifier_wrapper.sh ] && source /tests/sgonly_verifier_wrapper.sh
5+
26
set -eo pipefail
37
# RepoQA SR-QA Verification Script
48

benchmarks/csb_org_onboarding/ccx-onboard-search-209/tests/test.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
11
#!/bin/bash
2+
3+
# sg_only_env: restore full repo before verification (no-op for regular runs)
4+
[ -f /tmp/.sg_only_mode ] && [ -f /tests/sgonly_verifier_wrapper.sh ] && source /tests/sgonly_verifier_wrapper.sh
5+
26
set -eo pipefail
37
# RepoQA SR-QA Verification Script
48

benchmarks/csb_org_onboarding/ccx-onboard-search-210/tests/test.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
11
#!/bin/bash
2+
3+
# sg_only_env: restore full repo before verification (no-op for regular runs)
4+
[ -f /tmp/.sg_only_mode ] && [ -f /tests/sgonly_verifier_wrapper.sh ] && source /tests/sgonly_verifier_wrapper.sh
5+
26
set -eo pipefail
37
# RepoQA SR-QA Verification Script
48

0 commit comments

Comments
 (0)