Skip to content

Commit 2e52ab4

Browse files
committed
Fix baseline artifact runner semantics
1 parent b08164e commit 2e52ab4

File tree

339 files changed

+5118
-645
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

339 files changed

+5118
-645
lines changed

benchmarks/csb_org_compliance/ccx-compliance-051/instruction.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,6 @@ You are working on a codebase task involving repos from the compliance domain.
1212

1313
No local repositories are pre-checked out.
1414

15-
**Note:** Additional repositories are accessible via Sourcegraph MCP tools:
16-
*(none — all repos available locally)*
1715

1816
## Output Format
1917

benchmarks/csb_org_compliance/ccx-compliance-052/instruction.md

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,6 @@ You are working on a codebase task involving repos from the compliance domain.
1212

1313
The local `/workspace/` directory contains: sg-evals/envoy--v1.31.2, sg-evals/data-plane-api--84e84367, sg-evals/go-control-plane--71637ad6, sg-evals/grpc--957dba5e.
1414

15-
**Note:** Additional repositories are accessible via Sourcegraph MCP tools:
16-
- `sg-evals/envoy--v1.31.2` (envoyproxy/envoy)
17-
- `sg-evals/data-plane-api--84e84367` (envoyproxy/data-plane-api)
18-
- `sg-evals/go-control-plane--71637ad6` (envoyproxy/go-control-plane)
19-
- `sg-evals/grpc--957dba5e` (grpc/grpc)
2015

2116
## Output Format
2217

benchmarks/csb_org_compliance/ccx-compliance-053/instruction.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,6 @@ You are working on a codebase task involving repos from the compliance domain.
1212

1313
The local `/workspace/` directory contains: sg-evals/kafka--0753c489, sg-evals/flink--0cc95fcc, sg-evals/camel--1006f047.
1414

15-
**Note:** Additional repositories are accessible via Sourcegraph MCP tools:
16-
- `sg-evals/kafka--0753c489` (apache/kafka)
17-
- `sg-evals/flink--0cc95fcc` (apache/flink)
18-
- `sg-evals/camel--1006f047` (apache/camel)
1915

2016
## Output Format
2117

benchmarks/csb_org_compliance/ccx-compliance-115/instruction.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,6 @@ You are working on a codebase task involving repos from the compliance domain.
1212

1313
The local `/workspace/` directory contains: sg-evals/django--674eda1c.
1414

15-
**Note:** Additional repositories are accessible via Sourcegraph MCP tools:
16-
- `sg-evals/django--674eda1c` (django/django)
1715

1816
## Output Format
1917

benchmarks/csb_org_compliance/ccx-compliance-118/instruction.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,6 @@ You are working on a codebase task involving repos from the compliance domain.
1212

1313
The local `/workspace/` directory contains: sg-evals/django--674eda1c.
1414

15-
**Note:** Additional repositories are accessible via Sourcegraph MCP tools:
16-
- `sg-evals/django--674eda1c` (django/django)
1715

1816
## Output Format
1917

benchmarks/csb_org_compliance/ccx-compliance-124/instruction.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,6 @@ You are working on a codebase task involving repos from the compliance domain.
1212

1313
The local `/workspace/` directory contains: sg-evals/firefox--871325b8.
1414

15-
**Note:** Additional repositories are accessible via Sourcegraph MCP tools:
16-
- `sg-evals/firefox--871325b8` (mozilla-firefox/firefox)
1715

1816
## Output Format
1917

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# ccx-compliance-182 — artifact_baseline variant
2+
# Baseline with local code + artifact mode (verifier parses answer.json).
3+
4+
FROM ubuntu:22.04
5+
6+
ENV DEBIAN_FRONTEND=noninteractive
7+
8+
# Base tools
9+
RUN apt-get update && apt-get install -y --no-install-recommends \
10+
git \
11+
ca-certificates \
12+
curl \
13+
python3 \
14+
golang-go \
15+
&& rm -rf /var/lib/apt/lists/*
16+
17+
WORKDIR /workspace
18+
19+
# Clone local checkout repos (baseline config: agent has local access to these)
20+
RUN git clone --depth 1 https://github.com/sg-evals/kubernetes--v1.32.0 /workspace/kubernetes--v1.32.0
21+
RUN git clone --depth 1 https://github.com/sg-evals/client-go--v0.32.0 /workspace/client-go--v0.32.0
22+
RUN git clone --depth 1 https://github.com/sg-evals/api--v0.32.0 /workspace/api--v0.32.0
23+
RUN git clone --depth 1 https://github.com/sg-evals/etcd-io-etcd /workspace/etcd-io-etcd
24+
25+
# Initialize git identity for agent commits
26+
RUN git config --global user.email "agent@example.com" && \
27+
git config --global user.name "Agent" && \
28+
git config --global safe.directory '*'
29+
30+
# Create log directories
31+
RUN mkdir -p /logs/agent /logs/verifier
32+
33+
# Pre-create claude user and set ownership at build time so Harbor's
34+
# runtime chown is a no-op (avoids 15-30 min delay on large repos).
35+
RUN (adduser --disabled-password --gecos '' claude 2>/dev/null || true) && \
36+
for d in /workspace /app /testbed /logs; do [ -d "$d" ] && chown -R claude:claude "$d"; done || true
37+
38+
# Mark artifact-only mode — verifier parses answer.json
39+
RUN touch /tmp/.artifact_only_mode
40+
41+
ENTRYPOINT []

benchmarks/csb_org_compliance/ccx-compliance-182/instruction.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,6 @@ You are working on a codebase task involving repos from the compliance domain.
1212

1313
The local `/workspace/` directory contains: sg-evals/kubernetes--v1.32.0, sg-evals/client-go--v0.32.0, sg-evals/api--v0.32.0, sg-evals/etcd-io-etcd.
1414

15-
**Note:** Additional repositories are accessible via Sourcegraph MCP tools:
16-
- `sg-evals/etcd-io-etcd` (etcd-io/etcd)
1715

1816
## Output Format
1917

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# ccx-compliance-183 — artifact_baseline variant
2+
# Baseline with local code + artifact mode (verifier parses answer.json).
3+
4+
FROM ubuntu:22.04
5+
6+
ENV DEBIAN_FRONTEND=noninteractive
7+
8+
# Base tools
9+
RUN apt-get update && apt-get install -y --no-install-recommends \
10+
git \
11+
ca-certificates \
12+
curl \
13+
python3 \
14+
golang-go \
15+
&& rm -rf /var/lib/apt/lists/*
16+
17+
WORKDIR /workspace
18+
19+
# Clone local checkout repos (baseline config: agent has local access to these)
20+
RUN git clone --depth 1 https://github.com/sg-evals/kubernetes--v1.32.0 /workspace/kubernetes--v1.32.0
21+
RUN git clone --depth 1 https://github.com/sg-evals/client-go--v0.32.0 /workspace/client-go--v0.32.0
22+
RUN git clone --depth 1 https://github.com/sg-evals/api--v0.32.0 /workspace/api--v0.32.0
23+
RUN git clone --depth 1 https://github.com/sg-evals/etcd-io-etcd /workspace/etcd-io-etcd
24+
25+
# Initialize git identity for agent commits
26+
RUN git config --global user.email "agent@example.com" && \
27+
git config --global user.name "Agent" && \
28+
git config --global safe.directory '*'
29+
30+
# Create log directories
31+
RUN mkdir -p /logs/agent /logs/verifier
32+
33+
# Pre-create claude user and set ownership at build time so Harbor's
34+
# runtime chown is a no-op (avoids 15-30 min delay on large repos).
35+
RUN (adduser --disabled-password --gecos '' claude 2>/dev/null || true) && \
36+
for d in /workspace /app /testbed /logs; do [ -d "$d" ] && chown -R claude:claude "$d"; done || true
37+
38+
# Mark artifact-only mode — verifier parses answer.json
39+
RUN touch /tmp/.artifact_only_mode
40+
41+
ENTRYPOINT []

benchmarks/csb_org_compliance/ccx-compliance-183/instruction.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,6 @@ You are working on a codebase task involving repos from the compliance domain.
1212

1313
The local `/workspace/` directory contains: sg-evals/kubernetes--v1.32.0, sg-evals/client-go--v0.32.0, sg-evals/api--v0.32.0, sg-evals/etcd-io-etcd.
1414

15-
**Note:** Additional repositories are accessible via Sourcegraph MCP tools:
16-
- `sg-evals/etcd-io-etcd` (etcd-io/etcd)
1715

1816
## Output Format
1917

0 commit comments

Comments
 (0)