Skip to content

Commit 44045f1

Browse files
sjarmakclaude
andcommitted
fix: trace audit — fix 3 broken verifiers, MANIFEST model fallback, drop 2 TAC tasks
- navidrome test.sh: pytest → go test (Go/Ginkgo project) - nodebb-notif/nodebb-plugin test.sh: pytest → npx mocha + Mocha reward parsing - openlibrary Dockerfile.sg_only: pre-install Node.js 22 + Claude Code (sweap-images Node 16 broken) - generate_manifest.py: model extraction falls back to result.json when config.json missing (fixes ccb_feature/ccb_refactor incorrectly showing opus instead of haiku) - Drop 2 llamacpp TAC tasks (need external RocketChat server, incompatible with benchmark) - selected_benchmark_tasks.json: 414 → 412 tasks, ccb_test 20 → 18 - Add handoff doc for rerun setup (local Docker + Daytona) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent e100ad4 commit 44045f1

File tree

7 files changed

+601
-696
lines changed

7 files changed

+601
-696
lines changed

benchmarks/ccb_fix/navidrome-windows-log-fix-001/tests/test.sh

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -112,11 +112,11 @@ else
112112
echo "WARNING: Test patch failed to apply."
113113
fi
114114

115-
echo "Running specific pytest tests..."
116-
echo "Tests: "TestLog""
115+
echo "Running Go tests..."
116+
echo "Tests: TestLog"
117117

118-
# Run only the specific tests from fail_to_pass and pass_to_pass
119-
python -m pytest "TestLog" -v 2>&1 | tee test_output.log
118+
# Navidrome is a Go project using Ginkgo. Run the log package tests.
119+
go test -v -run "TestLog" ./log/... 2>&1 | tee test_output.log
120120
TEST_EXIT_CODE=$?
121121

122122
# Write reward for Harbor with partial credit

benchmarks/ccb_fix/nodebb-notif-dropdown-fix-001/tests/test.sh

Lines changed: 13 additions & 3 deletions
Large diffs are not rendered by default.

benchmarks/ccb_fix/nodebb-plugin-validate-fix-001/tests/test.sh

Lines changed: 12 additions & 2 deletions
Large diffs are not rendered by default.

benchmarks/ccb_fix/openlibrary-solr-boolean-fix-001/environment/Dockerfile.sg_only

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,14 @@ FROM jefzda/sweap-images:internetarchive.openlibrary-internetarchive__openlibrar
66

77
ENV SOURCEGRAPH_REPO_NAME=sg-evals/openlibrary--92db3454
88

9+
# Pre-install Node.js 22 + Claude Code at build time.
10+
# Harbor's install.sh fails on this image (NodeSource GPG needs /dev/tty)
11+
# but if claude is already on PATH, /tmp/claude_run.sh finds it.
12+
RUN curl -fsSL https://nodejs.org/dist/v22.14.0/node-v22.14.0-linux-x64.tar.gz \
13+
| tar -xz -C /usr/local --strip-components=1 \
14+
&& node --version && npm --version \
15+
&& npm install -g @anthropic-ai/claude-code@latest \
16+
&& which claude && claude --version
917

1018
RUN curl -LsSf https://astral.sh/uv/0.7.13/install.sh | sh || true
1119
RUN mkdir -p /logs /workspace

0 commit comments

Comments
 (0)