Skip to content

Commit 10b2a5e

Browse files
sjarmakclaude
andcommitted
fix: update hardcoded paths for CodeContextBench → CodeScaleBench directory rename
Update absolute paths in scripts, agents, and configs that reference /home/stephanie_jarmak/CodeContextBench/ to /home/stephanie_jarmak/CodeScaleBench/ in preparation for directory rename. Also update paper title in README. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 87ee736 commit 10b2a5e

File tree

8 files changed

+534
-8
lines changed

8 files changed

+534
-8
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# CodeScaleBench
22

3-
Benchmark suite for evaluating how AI coding agents leverage external context tools on software engineering tasks across the SDLC. Developed as the reproducibility artifact for the paper *"CodeScaleBench: A Systematic Evaluation Framework for Assessing the Impact of Enhanced Code Intelligence on AI Coding Agent Performance."*
3+
Benchmark suite for evaluating how AI coding agents leverage external context tools on software engineering tasks across the SDLC. Developed as the reproducibility artifact for the paper *"CodeScaleBench: Evaluating Coding Agents on Real-Scale Software Engineering Tasks Across the Development Lifecycle."*
44

55
This repository contains **benchmark task definitions**, **evaluation configs**, and a **metrics extraction pipeline**. Tasks are executed via the [Harbor](https://github.com/laude-institute/harbor/tree/main) runner with the Claude Code agent harness.
66

agents/claude_baseline_agent.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828
logger = logging.getLogger(__name__)
2929

3030
# Path to CLAUDE.md template for Deep Search tasks
31-
LOCOBENCH_CLAUDE_MD_TEMPLATE = Path("/home/stephanie_jarmak/CodeContextBench/benchmarks/locobench_agent/templates/CLAUDE.md")
31+
LOCOBENCH_CLAUDE_MD_TEMPLATE = Path("/home/stephanie_jarmak/CodeScaleBench/benchmarks/locobench_agent/templates/CLAUDE.md")
3232

3333
# System prompt for evaluation context - delivered via --append-system-prompt for ALL modes
3434
# This is the single authoritative source of test-first instructions (US-003)

agents/harnesses/base.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ class BaselineHarnessMixin:
1313

1414
# Path used by the Claude-specific template; remains available for fallback content.
1515
LOCOBENCH_CLAUDE_MD_TEMPLATE = Path(
16-
"/home/stephanie_jarmak/CodeContextBench/benchmarks/locobench_agent/templates/CLAUDE.md"
16+
"/home/stephanie_jarmak/CodeScaleBench/benchmarks/locobench_agent/templates/CLAUDE.md"
1717
)
1818

1919
EVALUATION_CONTEXT_PROMPT = """## EVALUATION CONTEXT

configs/control_plane_ccb.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Deterministic control plane for CodeContextBench 2-config runs.
1+
# Deterministic control plane for CodeScaleBench 2-config runs.
22
# Same file + same task source → same experiment_id and run list.
33
#
44
# Generate manifest:

configs/validate_one_per_benchmark.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,7 @@ for t in sel['tasks']:
114114
")
115115

116116
echo "=============================================="
117-
echo "CodeContextBench Validation Run (parallel)"
117+
echo "CodeScaleBench Validation Run (parallel)"
118118
echo "=============================================="
119119
if [ "$SG_ONLY" = true ]; then
120120
echo "Mode: sg_only_env smoke (Dockerfile.sg_only swap)"

scripts/daytona_task_registry.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"version": "1.0",
33
"generated_by": "scripts/build_daytona_registry.py",
4-
"benchmarks_dir": "/home/stephanie_jarmak/CodeContextBench/benchmarks",
4+
"benchmarks_dir": "/home/stephanie_jarmak/CodeScaleBench/benchmarks",
55
"summary": {
66
"total_tasks": 298,
77
"total_suites": 20,

scripts/extract_build_diary.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ def parse_transcripts(transcript_dir: str):
119119
fp = inp.get("file_path", "")
120120
if fp:
121121
fp = re.sub(
122-
r"^/home/stephanie_jarmak/CodeContextBench/",
122+
r"^/home/stephanie_jarmak/CodeScaleBench/",
123123
"", fp,
124124
)
125125
if name == "Read":
@@ -329,7 +329,7 @@ def main():
329329
parser.add_argument(
330330
"--transcript-dir",
331331
default=os.path.expanduser(
332-
"~/.claude/projects/-home-stephanie-jarmak-CodeContextBench/"
332+
"~/.claude/projects/-home-stephanie-jarmak-CodeScaleBench/"
333333
),
334334
)
335335
parser.add_argument("--output-dir", default="data/build_diary")

0 commit comments

Comments
 (0)