Skip to content

Commit de77eb8

Browse files
committed
docs: add learnings from Mar 15 JSONL sessions
New findings from 4 sessions (nightly report #9, compound PRD, prd.json conversion, learnings extraction): Skills/Automation: - 54 stale ~/CodeScaleBench paths in 25 skill files - 21 stale sourcegraph_full config refs in skills + schemas - 3 deprecated model IDs in skills Infrastructure: - No pyproject.toml; 200+ scripts use sys.path.insert hack - CI uses 3 Python versions across 4 workflows - Schema examples embed legacy suite names - prd-archive/ and prd.json not gitignored Also condensed verbose sections to stay within 12,288-byte limit.
1 parent 18ea666 commit de77eb8

File tree

3 files changed

+78
-84
lines changed

3 files changed

+78
-84
lines changed

AGENTS.md

Lines changed: 26 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -5,22 +5,17 @@ Keep it small. Use it to route to the right workflow and local guide, not as the
55
full operations manual.
66

77
## Non-Negotiables
8-
- All work happens on `main` by default. If you use feature branches, keep them small, short-lived, and easy to fast-forward back into `main`.
9-
- Every `harbor run` must be gated by interactive confirmation.
10-
- Before commit/push, run `python3 scripts/repo_health.py` (or `--quick` for docs/config-only changes).
11-
- Prefer a **remote execution environment** (e.g., Daytona) for large benchmark runs; use local Docker only when a task’s image or registry is incompatible with your cloud environment. See `docs/DAYTONA.md`.
12-
- Set **parallelism based on your own account and model limits**. Avoid exceeding documented concurrency or rate caps for your environment or provider.
13-
- Before launching any benchmark batch, check account readiness with `python3 scripts/check_infra.py` or `python3 scripts/account_health.py status`. Do not assume OAuth accounts are usable just because credentials exist.
8+
- All work on `main`. Feature branches: small, short-lived, fast-forward merge.
9+
- Every `harbor run` gated by interactive confirmation.
10+
- Before commit/push: `python3 scripts/repo_health.py` (or `--quick` for docs/config-only).
11+
- Prefer **Daytona** for large runs; local Docker only for incompatible tasks. See `docs/DAYTONA.md`.
12+
- Set parallelism to your account/model limits. Don’t exceed documented concurrency caps.
13+
- Pre-launch: `python3 scripts/check_infra.py` or `account_health.py status`. Don’t assume OAuth accounts work.
1414

1515
## Beads Prerequisite and Usage
16-
- Keep the Beads CLI (`bd`, alias `beads`) up to date before running agent workflows that rely on task graphs.
17-
- Install or update with the official installer:
18-
```bash
19-
curl -fsSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/install.sh | bash
20-
```
21-
- Verify install/version with `bd --version` (or `beads --version`).
22-
- Do not use `bd edit`; use non-interactive `bd create/update/close --json` or stdin-based `--description=-`.
23-
- Typical flow: `bd ready --json`, `bd create ... --json`, `bd update <id> --claim`, `bd close <id> --reason "Done"`.
16+
- Install: `curl -fsSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/install.sh | bash`
17+
- Verify: `bd --version`. No `bd edit`; use `bd create/update/close --json` or `--description=-`.
18+
- Flow: `bd ready --json`, `bd create ... --json`, `bd update <id> --claim`, `bd close <id> --reason "Done"`.
2419

2520
## Minimal Loading Policy
2621
- Default load order: this file + one relevant skill + one relevant doc.
@@ -70,8 +65,7 @@ curl -fsSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/insta
7065
- `BASELINE_MCP_TYPE` env var: `none`, `sourcegraph`, `deepsearch`.
7166
- Use Daytona SDK (`daytona_sdk`) over CLI (CLI is interactive-only for SSH).
7267
- GHCR packages default **private** for personal accounts; visibility change requires GitHub web UI.
73-
- Snapshot names are **positional**: `daytona snapshot create ccb-name`, NOT `--name`.
74-
- CLI/API version mismatch causes "Forbidden" errors. Keep CLI version in sync.
68+
- Snapshots are **positional** (`daytona snapshot create ccb-name`). CLI/API version mismatch → "Forbidden".
7569
- Registry types enum: `internal`, `organization`, `transient`, `backup`. Use `organization` for GHCR/Docker Hub.
7670

7771
### Docker / Build
@@ -100,7 +94,7 @@ curl -fsSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/insta
10094
- **no_changes_guard** must use `git diff origin/main HEAD` (not `git diff HEAD`) for auto-committing agents (e.g., OpenHands).
10195
- Verifier fallbacks: `${TASK_WORKDIR:-/workspace}` for workdir, `${TASK_REPO_ROOT:-${VERIFY_REPO:-/workspace}}` for repo root.
10296
- Set `GOWORK=off` in test.sh when sg_only verifier restores full repo (go.work may need newer Go).
103-
- **122 active tasks** (259 total with backups) hardcode `ANSWER_PATH="/workspace/answer.json"` without fallbacks. Also check `ANSWER_JSON` variable in `answer_json_verifier_lib.sh`. All use same template pattern; bulk fix feasible. Zero scores on non-Harbor harnesses.
97+
- **122 active tasks** hardcode `ANSWER_PATH="/workspace/answer.json"`. Check `ANSWER_JSON` in verifier lib. Bulk fix feasible; zero scores on non-Harbor.
10498

10599
### Scripts / Code Quality
106100
- **abc_audit.py duplicate functions**: `check_oa_equivalent_solutions`, `check_ob_negated_solutions`, `check_og_determinism`, `check_t10_shared_state` each defined twice. Python uses last definition silently.
@@ -130,6 +124,12 @@ curl -fsSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/insta
130124
### Schema / Suite Naming
131125
- 3 schemas use deprecated `ccb_mcp_*` enums; actual names are `csb_org_*`. 8 schema files have zero consumers.
132126
- **16 copies of `DIR_PREFIX_TO_SUITE`** across 30+ scripts with divergent definitions. Centralize in `csb_metrics/suite_registry.py`.
127+
- Schema examples embed legacy suite names (`ccb_crossrepo`, `ccb_locobench`); should be `csb_org_*`/`csb_sdlc_*`.
128+
129+
### Skills / Automation
130+
- **54 stale paths**: 25 skill files hardcode `~/CodeScaleBench` (actual `~/CodeContextBench`). Use `$(git rev-parse --show-toplevel)`.
131+
- **21 stale config refs**: `sourcegraph_full` in 14 skill files + 5 schemas. `BASELINE_MCP_TYPE=sourcegraph_full` is invalid (accepts `none`/`sourcegraph`/`deepsearch`).
132+
- **3 deprecated model IDs**: `claude-opus-4-5-20251101``claude-opus-4-6` in skills.
133133

134134
### Git / Auth
135135
- `gh auth refresh -h github.com -s write:packages` (explicit scope needed).
@@ -140,26 +140,24 @@ curl -fsSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/insta
140140
- **Remote URL stale**: `CodeContextBench.git` redirects to `CodeScaleBench.git`. Update local git remote config.
141141

142142
### Python / Subprocess
143-
- `dict.get(key, default)` doesn't guard against `None`. Use `data.get("key") or default_value`.
144-
- `with open(log) as f: Popen(stdout=f)` closes handle. Use bare `open()` for long-running subprocesses.
145-
- `json.load(open(path))` leaks FDs; use `with open`. macOS Bash 3.2 lacks `declare -A`; use `IFS='|' read -r`.
143+
- `dict.get(key, default)` doesn't guard `None`; use `or default_value`. `json.load(open())` leaks FDs; use `with open`.
144+
- `with open(log) as f: Popen(stdout=f)` closes handle; use bare `open()`. macOS Bash 3.2 lacks `declare -A`.
145+
- No `pyproject.toml`/`requirements.txt`. 200+ scripts + 9 tests use `sys.path.insert` hack. Blocks packaging, onboarding.
146146

147147
### LLM Judge
148-
- Include "Respond with valid JSON only" in prompts. Unescaped quotes break parsing.
149-
- Task-type-aware rubrics. Check `mcp__` prefix before substring-based tool categorization.
148+
- "Respond with valid JSON only" in prompts. Task-type-aware rubrics. Check `mcp__` prefix before substring-based categorization.
150149

151150
### OpenHands
152-
- `sandbox_plugins = []`. Base64-encode instructions. Alpine → `bookworm`. MCP client ~30s timeout.
153-
- Block `deepsearch`/`deepsearch_read` in proxy; redirect to `keyword_search`/`nls_search`.
154-
- `chown -R /workspace` blocks on large repos. Edit `runtime_init.py`. Set `PYTHONSAFEPATH=1`.
151+
- `sandbox_plugins = []`. Base64-encode instructions. Alpine → `bookworm`. MCP client ~30s timeout. Block `deepsearch`/`deepsearch_read` in proxy.
152+
- `chown -R /workspace` blocks large repos; edit `runtime_init.py`. Set `PYTHONSAFEPATH=1`.
155153

156154
### CI / Workflows
157155
- `docs-consistency.yml` redundant (subsumed by `repo_health.yml`). Export HTML truncates at 1200 rows.
156+
- 4 workflows use 3 Python versions (3.10/3.11/3.12); standardize to 3.10. `roam.yml` unpinned `pip install roam-code`.
158157

159158
### Pre-commit / Pytest / Ralph
160-
- Secret-detection false-positives on detection code. Use `--no-verify` when flagged code is detection logic.
161-
- Classes named `TestPlan`/`TestCase`/`TestResult` auto-collected by pytest. Rename to `EvaluationPlan` etc.
162-
- Ralph: `progress.txt` on feature branches, compound after merge. `prd.json` is single-active; archive before overwrite.
159+
- Secret-detection false-positives: use `--no-verify` when flagged code is detection logic. Classes `TestPlan`/`TestCase`/`TestResult` auto-collected by pytest; rename.
160+
- Ralph: `prd.json` single-active; archive before overwrite. `prd-archive/` and `prd.json` not gitignored; risk of accidental commit.
163161

164162
## Maintenance
165163
- Root and local `AGENTS.md` / `CLAUDE.md` files are generated from sources in `docs/ops/`.

CLAUDE.md

Lines changed: 26 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -5,22 +5,17 @@ Keep it small. Use it to route to the right workflow and local guide, not as the
55
full operations manual.
66

77
## Non-Negotiables
8-
- All work happens on `main` by default. If you use feature branches, keep them small, short-lived, and easy to fast-forward back into `main`.
9-
- Every `harbor run` must be gated by interactive confirmation.
10-
- Before commit/push, run `python3 scripts/repo_health.py` (or `--quick` for docs/config-only changes).
11-
- Prefer a **remote execution environment** (e.g., Daytona) for large benchmark runs; use local Docker only when a task’s image or registry is incompatible with your cloud environment. See `docs/DAYTONA.md`.
12-
- Set **parallelism based on your own account and model limits**. Avoid exceeding documented concurrency or rate caps for your environment or provider.
13-
- Before launching any benchmark batch, check account readiness with `python3 scripts/check_infra.py` or `python3 scripts/account_health.py status`. Do not assume OAuth accounts are usable just because credentials exist.
8+
- All work on `main`. Feature branches: small, short-lived, fast-forward merge.
9+
- Every `harbor run` gated by interactive confirmation.
10+
- Before commit/push: `python3 scripts/repo_health.py` (or `--quick` for docs/config-only).
11+
- Prefer **Daytona** for large runs; local Docker only for incompatible tasks. See `docs/DAYTONA.md`.
12+
- Set parallelism to your account/model limits. Don’t exceed documented concurrency caps.
13+
- Pre-launch: `python3 scripts/check_infra.py` or `account_health.py status`. Don’t assume OAuth accounts work.
1414

1515
## Beads Prerequisite and Usage
16-
- Keep the Beads CLI (`bd`, alias `beads`) up to date before running agent workflows that rely on task graphs.
17-
- Install or update with the official installer:
18-
```bash
19-
curl -fsSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/install.sh | bash
20-
```
21-
- Verify install/version with `bd --version` (or `beads --version`).
22-
- Do not use `bd edit`; use non-interactive `bd create/update/close --json` or stdin-based `--description=-`.
23-
- Typical flow: `bd ready --json`, `bd create ... --json`, `bd update <id> --claim`, `bd close <id> --reason "Done"`.
16+
- Install: `curl -fsSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/install.sh | bash`
17+
- Verify: `bd --version`. No `bd edit`; use `bd create/update/close --json` or `--description=-`.
18+
- Flow: `bd ready --json`, `bd create ... --json`, `bd update <id> --claim`, `bd close <id> --reason "Done"`.
2419

2520
## Minimal Loading Policy
2621
- Default load order: this file + one relevant skill + one relevant doc.
@@ -70,8 +65,7 @@ curl -fsSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/insta
7065
- `BASELINE_MCP_TYPE` env var: `none`, `sourcegraph`, `deepsearch`.
7166
- Use Daytona SDK (`daytona_sdk`) over CLI (CLI is interactive-only for SSH).
7267
- GHCR packages default **private** for personal accounts; visibility change requires GitHub web UI.
73-
- Snapshot names are **positional**: `daytona snapshot create ccb-name`, NOT `--name`.
74-
- CLI/API version mismatch causes "Forbidden" errors. Keep CLI version in sync.
68+
- Snapshots are **positional** (`daytona snapshot create ccb-name`). CLI/API version mismatch → "Forbidden".
7569
- Registry types enum: `internal`, `organization`, `transient`, `backup`. Use `organization` for GHCR/Docker Hub.
7670

7771
### Docker / Build
@@ -100,7 +94,7 @@ curl -fsSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/insta
10094
- **no_changes_guard** must use `git diff origin/main HEAD` (not `git diff HEAD`) for auto-committing agents (e.g., OpenHands).
10195
- Verifier fallbacks: `${TASK_WORKDIR:-/workspace}` for workdir, `${TASK_REPO_ROOT:-${VERIFY_REPO:-/workspace}}` for repo root.
10296
- Set `GOWORK=off` in test.sh when sg_only verifier restores full repo (go.work may need newer Go).
103-
- **122 active tasks** (259 total with backups) hardcode `ANSWER_PATH="/workspace/answer.json"` without fallbacks. Also check `ANSWER_JSON` variable in `answer_json_verifier_lib.sh`. All use same template pattern; bulk fix feasible. Zero scores on non-Harbor harnesses.
97+
- **122 active tasks** hardcode `ANSWER_PATH="/workspace/answer.json"`. Check `ANSWER_JSON` in verifier lib. Bulk fix feasible; zero scores on non-Harbor.
10498

10599
### Scripts / Code Quality
106100
- **abc_audit.py duplicate functions**: `check_oa_equivalent_solutions`, `check_ob_negated_solutions`, `check_og_determinism`, `check_t10_shared_state` each defined twice. Python uses last definition silently.
@@ -130,6 +124,12 @@ curl -fsSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/insta
130124
### Schema / Suite Naming
131125
- 3 schemas use deprecated `ccb_mcp_*` enums; actual names are `csb_org_*`. 8 schema files have zero consumers.
132126
- **16 copies of `DIR_PREFIX_TO_SUITE`** across 30+ scripts with divergent definitions. Centralize in `csb_metrics/suite_registry.py`.
127+
- Schema examples embed legacy suite names (`ccb_crossrepo`, `ccb_locobench`); should be `csb_org_*`/`csb_sdlc_*`.
128+
129+
### Skills / Automation
130+
- **54 stale paths**: 25 skill files hardcode `~/CodeScaleBench` (actual `~/CodeContextBench`). Use `$(git rev-parse --show-toplevel)`.
131+
- **21 stale config refs**: `sourcegraph_full` in 14 skill files + 5 schemas. `BASELINE_MCP_TYPE=sourcegraph_full` is invalid (accepts `none`/`sourcegraph`/`deepsearch`).
132+
- **3 deprecated model IDs**: `claude-opus-4-5-20251101``claude-opus-4-6` in skills.
133133

134134
### Git / Auth
135135
- `gh auth refresh -h github.com -s write:packages` (explicit scope needed).
@@ -140,26 +140,24 @@ curl -fsSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/insta
140140
- **Remote URL stale**: `CodeContextBench.git` redirects to `CodeScaleBench.git`. Update local git remote config.
141141

142142
### Python / Subprocess
143-
- `dict.get(key, default)` doesn't guard against `None`. Use `data.get("key") or default_value`.
144-
- `with open(log) as f: Popen(stdout=f)` closes handle. Use bare `open()` for long-running subprocesses.
145-
- `json.load(open(path))` leaks FDs; use `with open`. macOS Bash 3.2 lacks `declare -A`; use `IFS='|' read -r`.
143+
- `dict.get(key, default)` doesn't guard `None`; use `or default_value`. `json.load(open())` leaks FDs; use `with open`.
144+
- `with open(log) as f: Popen(stdout=f)` closes handle; use bare `open()`. macOS Bash 3.2 lacks `declare -A`.
145+
- No `pyproject.toml`/`requirements.txt`. 200+ scripts + 9 tests use `sys.path.insert` hack. Blocks packaging, onboarding.
146146

147147
### LLM Judge
148-
- Include "Respond with valid JSON only" in prompts. Unescaped quotes break parsing.
149-
- Task-type-aware rubrics. Check `mcp__` prefix before substring-based tool categorization.
148+
- "Respond with valid JSON only" in prompts. Task-type-aware rubrics. Check `mcp__` prefix before substring-based categorization.
150149

151150
### OpenHands
152-
- `sandbox_plugins = []`. Base64-encode instructions. Alpine → `bookworm`. MCP client ~30s timeout.
153-
- Block `deepsearch`/`deepsearch_read` in proxy; redirect to `keyword_search`/`nls_search`.
154-
- `chown -R /workspace` blocks on large repos. Edit `runtime_init.py`. Set `PYTHONSAFEPATH=1`.
151+
- `sandbox_plugins = []`. Base64-encode instructions. Alpine → `bookworm`. MCP client ~30s timeout. Block `deepsearch`/`deepsearch_read` in proxy.
152+
- `chown -R /workspace` blocks large repos; edit `runtime_init.py`. Set `PYTHONSAFEPATH=1`.
155153

156154
### CI / Workflows
157155
- `docs-consistency.yml` redundant (subsumed by `repo_health.yml`). Export HTML truncates at 1200 rows.
156+
- 4 workflows use 3 Python versions (3.10/3.11/3.12); standardize to 3.10. `roam.yml` unpinned `pip install roam-code`.
158157

159158
### Pre-commit / Pytest / Ralph
160-
- Secret-detection false-positives on detection code. Use `--no-verify` when flagged code is detection logic.
161-
- Classes named `TestPlan`/`TestCase`/`TestResult` auto-collected by pytest. Rename to `EvaluationPlan` etc.
162-
- Ralph: `progress.txt` on feature branches, compound after merge. `prd.json` is single-active; archive before overwrite.
159+
- Secret-detection false-positives: use `--no-verify` when flagged code is detection logic. Classes `TestPlan`/`TestCase`/`TestResult` auto-collected by pytest; rename.
160+
- Ralph: `prd.json` single-active; archive before overwrite. `prd-archive/` and `prd.json` not gitignored; risk of accidental commit.
163161

164162
## Maintenance
165163
- Root and local `AGENTS.md` / `CLAUDE.md` files are generated from sources in `docs/ops/`.

0 commit comments

Comments
 (0)