Skip to content

Commit c5a36ed

Browse files
LoCoBench Botclaude
andcommitted
fix: QA audit fixes for Ralph-generated benchmark tasks
Critical fixes across 7 suites (crossrepo, docgen, investigation, nlqa, onboarding, security): - Replace fabricated git SHAs with real ones in 3 crossrepo Dockerfiles (verified via GitHub API) - Fix 9/10 wrong file paths in crossrepo-sym-003 ground_truth.json - Fix hardcoded host paths in docgen-migration test.sh (use /tests/ Harbor path) - Rewrite docgen-migration test.sh to follow Harbor verifier conventions (reward.txt, exit 0) - Enrich near-empty docgen-api-003 task.toml with name/description/metadata - Remove MCP/Sourcegraph tool name contamination from 5 suite CLAUDE.md files - Remove MCP tool references from 5 instruction.md files (baseline contamination) - chmod +x on 15 test.sh files missing execute permission Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 95683eb commit c5a36ed

File tree

32 files changed

+133
-135
lines changed

32 files changed

+133
-135
lines changed

benchmarks/ccb_crossrepo/crossrepo-chain-001/instruction.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,6 @@ Write your results to `/workspace/chain.json`:
6060
## Notes
6161

6262
- The kubernetes/kubernetes repository contains a staging directory (`staging/src/k8s.io/`) with code that is synced to separate repositories (kubernetes/api, kubernetes/apimachinery). For this task, treat them as separate codebases.
63-
- Use `go_to_definition` or cross-file search to trace imports and type references.
63+
- Use cross-file search or definition lookup to trace imports and type references.
6464
- You may encounter intermediate re-exports—document all steps.
6565
- Line numbers are approximate; +/- 50 lines is acceptable if the symbol is in that region.

benchmarks/ccb_crossrepo/crossrepo-chain-001/tests/test.sh

100644100755
File mode changed.

benchmarks/ccb_crossrepo/crossrepo-chain-002/instruction.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,6 @@ Write your results to `/workspace/chain.json`:
6060
## Notes
6161

6262
- The go-control-plane repository contains **generated code** from protobuf definitions. The `.pb.go` files are auto-generated from `.proto` files in data-plane-api.
63-
- Use `go_to_definition` or import tracing to navigate from Istio's usage to the generated Go types.
63+
- Use definition lookup or import tracing to navigate from Istio's usage to the generated Go types.
6464
- The final step should identify the **source `.proto` file**, not just the generated code.
6565
- Line numbers are approximate; +/- 50 lines is acceptable if the symbol is in that region.

benchmarks/ccb_crossrepo/crossrepo-chain-002/tests/test.sh

100644100755
File mode changed.

benchmarks/ccb_crossrepo/crossrepo-impl-001/environment/Dockerfile

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,15 +13,15 @@ RUN apt-get update -qq && apt-get install -y -qq \
1313

1414
WORKDIR /workspace
1515

16-
# Clone kubernetes/api at pinned SHA (2026-02-16)
16+
# Clone kubernetes/api (same pinned SHA as crossrepo-chain-001)
1717
RUN git clone --depth 1 https://github.com/kubernetes/api.git api \
18-
&& cd api && git fetch --depth 1 origin c4a8a38b3ae0f0f8d6c8e0b3f7d9c8e8f9e0c8d7 \
19-
&& git checkout c4a8a38b3ae0f0f8d6c8e0b3f7d9c8e8f9e0c8d7
18+
&& cd api && git fetch --depth 1 origin f32ed1d60cf0787a512bebd6c06a4b84ae0b7cc7 \
19+
&& git checkout f32ed1d60cf0787a512bebd6c06a4b84ae0b7cc7
2020

21-
# Clone kubernetes/apimachinery at pinned SHA (2026-02-16)
21+
# Clone kubernetes/apimachinery (same pinned SHA as crossrepo-chain-001)
2222
RUN git clone --depth 1 https://github.com/kubernetes/apimachinery.git apimachinery \
23-
&& cd apimachinery && git fetch --depth 1 origin a8e3f1c7b9d2e4f6a8b0c9d7e5f3a1b8c6d4e2f0 \
24-
&& git checkout a8e3f1c7b9d2e4f6a8b0c9d7e5f3a1b8c6d4e2f0
23+
&& cd apimachinery && git fetch --depth 1 origin b2e9f88ff6d4c50c13061a53b1239c7707354eda \
24+
&& git checkout b2e9f88ff6d4c50c13061a53b1239c7707354eda
2525

2626
RUN mkdir -p /logs/verifier /logs/agent
2727

benchmarks/ccb_crossrepo/crossrepo-impl-002/environment/Dockerfile

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -13,20 +13,20 @@ RUN apt-get update -qq && apt-get install -y -qq \
1313

1414
WORKDIR /workspace
1515

16-
# Clone envoyproxy/go-control-plane at pinned SHA (2026-02-16)
16+
# Clone envoyproxy/go-control-plane at xdsmatcher/v0.14.0
1717
RUN git clone --depth 1 https://github.com/envoyproxy/go-control-plane.git go-control-plane \
18-
&& cd go-control-plane && git fetch --depth 1 origin 0b8a46e8b5e5e8c5d9a7f4b3c2e1a8d9f6c7e5a4 \
19-
&& git checkout 0b8a46e8b5e5e8c5d9a7f4b3c2e1a8d9f6c7e5a4
18+
&& cd go-control-plane && git fetch --depth 1 origin 71637ad69bbc5f51fbb2562e612a4365292804a5 \
19+
&& git checkout 71637ad69bbc5f51fbb2562e612a4365292804a5
2020

21-
# Clone istio/istio at pinned SHA (2026-02-16)
21+
# Clone istio/istio at 1.29.0
2222
RUN git clone --depth 1 https://github.com/istio/istio.git istio \
23-
&& cd istio && git fetch --depth 1 origin 7f8c6b5a4d3e2c1b9a8f7e6d5c4b3a2e1d0c9b8a \
24-
&& git checkout 7f8c6b5a4d3e2c1b9a8f7e6d5c4b3a2e1d0c9b8a
23+
&& cd istio && git fetch --depth 1 origin 2300e2458ab713c2c514a58da1ea8b03343ada7e \
24+
&& git checkout 2300e2458ab713c2c514a58da1ea8b03343ada7e
2525

26-
# Clone emissary-ingress/emissary at pinned SHA (2026-02-16)
26+
# Clone emissary-ingress/emissary at v4.0.0-rc.1
2727
RUN git clone --depth 1 https://github.com/emissary-ingress/emissary.git emissary \
28-
&& cd emissary && git fetch --depth 1 origin 3e5a9c8b7d6f4a2e1c0b9a8d7f6e5c4b3a2d1e0f \
29-
&& git checkout 3e5a9c8b7d6f4a2e1c0b9a8d7f6e5c4b3a2d1e0f
28+
&& cd emissary && git fetch --depth 1 origin 3bbdbe0fafcc9dd6b9f54935d34f6614afb49302 \
29+
&& git checkout 3bbdbe0fafcc9dd6b9f54935d34f6614afb49302
3030

3131
RUN mkdir -p /logs/verifier /logs/agent
3232

benchmarks/ccb_crossrepo/crossrepo-sym-001/tests/test.sh

100644100755
File mode changed.

benchmarks/ccb_crossrepo/crossrepo-sym-003/environment/Dockerfile

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -13,17 +13,15 @@ RUN apt-get update -qq && apt-get install -y -qq \
1313

1414
WORKDIR /workspace
1515

16-
# Clone hashicorp/terraform at pinned SHA (v1.10.5 release - 2026-02-14)
17-
# This version includes providers.Interface with all modern provider methods
16+
# Clone hashicorp/terraform at v1.15.0-alpha20260204
1817
RUN git clone --depth 1 https://github.com/hashicorp/terraform.git terraform \
19-
&& cd terraform && git fetch --depth 1 origin 6b7e0f87f5a7a2c8faa83d2c7a5f2f2e7d5c4b3a \
20-
&& git checkout 6b7e0f87f5a7a2c8faa83d2c7a5f2f2e7d5c4b3a
18+
&& cd terraform && git fetch --depth 1 origin f65c52c8995a627e64c82827c75edf99a9399e4b \
19+
&& git checkout f65c52c8995a627e64c82827c75edf99a9399e4b
2120

22-
# Clone hashicorp/terraform-provider-aws at pinned SHA (v5.87.0 - 2026-02-13)
23-
# Recent stable release with full Provider interface implementation
21+
# Clone hashicorp/terraform-provider-aws at v6.32.1
2422
RUN git clone --depth 1 https://github.com/hashicorp/terraform-provider-aws.git terraform-provider-aws \
25-
&& cd terraform-provider-aws && git fetch --depth 1 origin 5a2c7f3e8b9d1a4f6e2c8d7b3a9f1e5c4b8d2a6f \
26-
&& git checkout 5a2c7f3e8b9d1a4f6e2c8d7b3a9f1e5c4b8d2a6f
23+
&& cd terraform-provider-aws && git fetch --depth 1 origin e9b4629e752e76551c7bca0f317e98e43bc96a7c \
24+
&& git checkout e9b4629e752e76551c7bca0f317e98e43bc96a7c
2725

2826
RUN mkdir -p /logs/verifier /logs/agent
2927

benchmarks/ccb_crossrepo/crossrepo-sym-003/tests/ground_truth.json

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,47 +3,47 @@
33
"entries": [
44
{
55
"repo": "hashicorp/terraform",
6-
"file": "terraform/provider_mock.go",
6+
"file": "internal/providers/testing/provider_mock.go",
77
"function": "MockProvider"
88
},
99
{
1010
"repo": "hashicorp/terraform",
11-
"file": "terraform/eval_provider.go",
11+
"file": "internal/terraform/eval_provider.go",
1212
"function": "getProvider"
1313
},
1414
{
1515
"repo": "hashicorp/terraform",
16-
"file": "terraform/node_provider_abstract.go",
17-
"function": "NodeAbstractProvider.Execute"
16+
"file": "internal/terraform/node_provider.go",
17+
"function": "NodeApplyableProvider.Execute"
1818
},
1919
{
2020
"repo": "hashicorp/terraform",
21-
"file": "terraform/node_resource_apply_instance.go",
21+
"file": "internal/terraform/node_resource_apply_instance.go",
2222
"function": "managedResourceExecute"
2323
},
2424
{
2525
"repo": "hashicorp/terraform",
26-
"file": "terraform/node_resource_plan_instance.go",
26+
"file": "internal/terraform/node_resource_abstract_instance.go",
2727
"function": "plan"
2828
},
2929
{
3030
"repo": "hashicorp/terraform",
31-
"file": "terraform/node_resource_plan_instance.go",
31+
"file": "internal/terraform/node_resource_abstract_instance.go",
3232
"function": "refresh"
3333
},
3434
{
3535
"repo": "hashicorp/terraform",
36-
"file": "terraform/node_resource_validate.go",
36+
"file": "internal/terraform/node_resource_validate.go",
3737
"function": "validateResource"
3838
},
3939
{
4040
"repo": "hashicorp/terraform",
41-
"file": "terraform/upgrade_resource_state.go",
41+
"file": "internal/terraform/upgrade_resource_state.go",
4242
"function": "upgradeResourceState"
4343
},
4444
{
4545
"repo": "hashicorp/terraform-provider-aws",
46-
"file": "internal/provider/provider.go",
46+
"file": "internal/provider/factory.go",
4747
"function": "ProtoV5ProviderServerFactory"
4848
},
4949
{

benchmarks/ccb_docgen/CLAUDE.md

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,14 @@ This suite tests your ability to generate accurate, comprehensive documentation
44

55
## Search Strategy
66

7-
**This repository is large.** You MUST use Sourcegraph MCP tools for documentation:
8-
9-
- Use `keyword_search` to find exported functions, public APIs, and type definitions
10-
- Use `find_references` to understand how APIs are used in practice (for usage examples)
11-
- Use `go_to_definition` to read full function signatures, parameter types, and return types
12-
- Use `nls_search` for conceptual queries like "error handling patterns" or "configuration options"
13-
- Use `read_file` to examine existing documentation, docstrings, and comments
14-
- Use `list_files` to discover module structure and identify what needs documenting
15-
- Use `deepsearch` to understand complex subsystems before documenting them
7+
**This repository is large.** Use code search tools to efficiently navigate:
8+
9+
- Search for exported functions, public APIs, and type definitions by keyword
10+
- Trace how APIs are used in practice via references (for usage examples)
11+
- Navigate to definitions to read full function signatures, parameter types, and return types
12+
- Use semantic search for conceptual queries like "error handling patterns" or "configuration options"
13+
- Read existing documentation, docstrings, and comments for context
14+
- Explore module structure to identify what needs documenting
1615

1716
## Output Requirements
1817

0 commit comments

Comments
 (0)