diff --git a/.claude/agents/security-reviewer.md b/.claude/agents/security-reviewer.md index a56250453..04d4b1380 100644 --- a/.claude/agents/security-reviewer.md +++ b/.claude/agents/security-reviewer.md @@ -1,10 +1,16 @@ +--- +name: security-reviewer +description: Reviews findings from AgentShield + zizmor against the project's CLAUDE.md security rules and grades the result A-F. Spawned by the security-scan skill after the static scans run. +tools: Read, Grep, Glob, Bash(git:*), Bash(rg:*), Bash(grep:*), Bash(find:*), Bash(ls:*), Bash(pnpm exec agentshield:*), Bash(zizmor:*), Bash(command -v:*), Bash(cat:*), Bash(head:*), Bash(tail:*) +--- + You are a security reviewer for Socket Security Node.js repositories. Apply these rules from CLAUDE.md exactly: **Safe File Operations**: Use safeDelete()/safeDeleteSync() from @socketsecurity/lib/fs. NEVER fs.rm(), fs.rmSync(), or rm -rf. Use os.tmpdir() + fs.mkdtemp() for temp dirs. NEVER use fetch() — use httpJson/httpText/httpRequest from @socketsecurity/lib/http-request. -**Absolute Rules**: NEVER use npx, pnpm dlx, or yarn dlx. Use pnpm exec or pnpm run with pinned devDeps. +**Absolute Rules**: NEVER use npx, pnpm dlx, or yarn dlx. Use pnpm exec or pnpm run with pinned devDeps. # zizmor: documentation-prohibition **Work Safeguards**: Scripts modifying multiple files must have backup/rollback. Git operations that rewrite history require explicit confirmation. @@ -12,7 +18,7 @@ Apply these rules from CLAUDE.md exactly: 1. **Secrets**: Hardcoded API keys, passwords, tokens, private keys in code or config 2. **Injection**: Command injection via shell: true or string interpolation in spawn/exec. Path traversal in file operations. -3. **Dependencies**: npx/dlx usage. Unpinned versions (^ or ~). Missing minimumReleaseAge bypass justification. +3. **Dependencies**: npx/dlx usage. Unpinned versions (^ or ~). Missing soak-window bypass justification (pnpm-workspace.yaml `minimumReleaseAgeExclude`). # zizmor: documentation-checklist 4. **File operations**: fs.rm without safeDelete. process.chdir usage. fetch() usage (must use lib's httpRequest). 5. **GitHub Actions**: Unpinned action versions (must use full SHA). Secrets outside env blocks. Template injection from untrusted inputs. 6. **Error handling**: Sensitive data in error messages. Stack traces exposed to users. diff --git a/.claude/skills/_shared/path-guard-rule.md b/.claude/skills/_shared/path-guard-rule.md new file mode 100644 index 000000000..2447f8b71 --- /dev/null +++ b/.claude/skills/_shared/path-guard-rule.md @@ -0,0 +1,39 @@ + + +## 1 path, 1 reference + +**A path is *constructed* exactly once. Everywhere else *references* the constructed value.** + +Referencing a single computed path many times is fine — that's the whole point of computing it once. What's banned is *re-constructing* the same path in multiple places, because that's where drift is born. Three concrete shapes: + +1. **Within a package** — every script, test, and lib file that needs a build path imports it from the package's `scripts/paths.mts` (or `lib/paths.mts`). No `path.join('build', mode, ...)` outside that module. + +2. **Across packages** — when package B consumes package A's output, B imports A's `paths.mts` via the workspace `exports` field. Never `path.join(PKG, '..', '', 'build', ...)`. The R28 yoga/ink bug — ink hand-building yoga's wasm path and missing the `wasm/` segment — is the canonical failure mode this rule prevents. + +3. **Workflows, Dockerfiles, shell scripts** — they can't `import` TS, so they construct the string once and reference it everywhere downstream. Workflows: a "Compute paths" step exposes `steps.paths.outputs.final_dir`; later steps read `${{ steps.paths.outputs.final_dir }}`. Dockerfiles/shell: assign once to a variable, reference by name thereafter. Each canonical construction carries a comment naming the source-of-truth `paths.mts` so the YAML can't drift from TS without a flagged change. **Re-building** the same path in a second step is the violation, not referring to the constructed value many times. + +Comments that re-state a full path are forbidden. The import statement IS the comment. Docs and READMEs may describe the structure ("output goes under the Final dir") but should not encode a complete `build///out/Final/binary` string — encoded paths get parsed by tools and silently rot. + +Code execution takes priority over docs: violations in `.mts`/`.cts`, Makefiles, Dockerfiles, workflow YAML, and shell scripts are blocking. README and doc-comment violations are advisory unless they contain a fully-qualified path with no parametric placeholders. + +### Three-level enforcement + +- **Hook** — `.claude/hooks/path-guard/` blocks `Edit`/`Write` calls that would introduce a violation in a `.mts`/`.cts` file. Refusal at edit time stops new duplication from landing. +- **Gate** — `scripts/check-paths.mts` runs in `pnpm check`. Fails the build on any violation that isn't allowlisted. +- **Skill** — `/path-guard` audits the repo and fixes findings; `/path-guard check` reports only; `/path-guard install` drops the gate + hook + rule into a fresh repo. + +The mantra is intentionally short so it sticks: **1 path, 1 reference**. When in doubt, find the canonical owner and import from it. diff --git a/.claude/skills/path-guard/SKILL.md b/.claude/skills/path-guard/SKILL.md new file mode 100644 index 000000000..8ff21c2b8 --- /dev/null +++ b/.claude/skills/path-guard/SKILL.md @@ -0,0 +1,250 @@ +--- +name: path-guard +description: Audit and fix path duplication in this Socket repo. Apply the strict "1 path, 1 reference" rule — every build/test/runtime/config path is constructed exactly once; everywhere else references the constructed value. Default mode finds and fixes; `check` mode reports only; `install` mode drops the gate + hook + rule into a fresh repo. +user-invocable: true +allowed-tools: Task, Read, Edit, Write, Grep, Glob, AskUserQuestion, Bash(pnpm run check:*), Bash(node scripts/check-paths:*), Bash(rg:*), Bash(grep:*), Bash(find:*), Bash(git:*) +--- + +# path-guard + +**Mantra: 1 path, 1 reference.** A path is constructed exactly once; everywhere else references the constructed value. Re-constructing the same path twice is the violation, not referencing the constructed value many times. + +## Modes + +- `/path-guard` — full audit-and-fix conversion of the current repo (default). +- `/path-guard check` — read-only audit, report violations, no fixes. +- `/path-guard fix ` — fix a single finding from a prior `check` run, by index. +- `/path-guard install` — drop the gate + hook + rule + allowlist into a fresh repo (for new Socket repos). + +## Three-level enforcement + +The strategy lives in three artifacts that ship together: + +1. **CLAUDE.md rule** — the mantra and detection rules in plain language. Every Socket repo's CLAUDE.md carries `## 1 path, 1 reference`. Synced from `.claude/skills/_shared/path-guard-rule.md`. +2. **Hook** — `.claude/hooks/path-guard/index.mts` runs `PreToolUse` on `Edit`/`Write` of `.mts`/`.cts` files. Blocks new violations at edit time. Mandatory across the fleet. +3. **Gate** — `scripts/check-paths.mts` runs in `pnpm check` (and CI). Whole-repo scan. Fails the build on any unsanctioned violation. + +The hook and gate share their stage / build-root / mode / sibling-package vocabulary via `.claude/hooks/path-guard/segments.mts` — a single canonical source. Adding a new stage segment or fleet package means editing one file; the two consumers can never drift on what counts as a build-output path. + +This skill is the *audit-and-fix workflow* that makes a repo conform initially and validates conformance over time. + +## Detection rules + +The gate enforces six rules. The hook enforces a subset (A and B) since it sees only one diff at a time. + +| Rule | What it catches | Where checked | +|---|---|---| +| **A** | Multi-stage `path.join(...)` constructed inline. Two or more "stage" segments (Final, Release, Stripped, Compressed, Optimized, Synced, wasm, downloaded), or one stage + build-root + mode. | `.mts`/`.cts` files outside a `paths.mts`. Hook + gate. | +| **B** | Cross-package traversal: `path.join(*, '..', '', 'build', ...)` reaching into a sibling's output instead of importing via `exports`. | `.mts`/`.cts` files. Hook + gate. | +| **C** | Workflow YAML constructs the same path string in 2+ steps outside a "Compute paths" step. | `.github/workflows/*.yml`. Gate. | +| **D** | Comment encodes a fully-qualified multi-stage path string (e.g. `# build/dev/darwin-arm64/out/Final/binary`). | `.github/workflows/*.yml`. Gate. | +| **F** | Same path shape constructed in 2+ different files. | All scanned files. Gate. | +| **G** | Hand-built multi-stage path constructed 2+ times in the same Makefile/Dockerfile/shell stage. | `Makefile`, `*.mk`, `*.Dockerfile`, `Dockerfile.*`, `*.sh`. Gate. | + +Comments may describe path *structure* with placeholders (`/` or `${BUILD_MODE}/${PLATFORM_ARCH}`) but should not encode a complete literal path string. Code execution takes priority over docs: violations in `.mts`, Makefiles, Dockerfiles, workflow YAML, shell scripts are blocking. + +## Mode: audit-and-fix (default) + +When invoked as `/path-guard` with no arg: + +1. **Setup** — spawn a worktree off `main` per `CLAUDE.md` parallel-sessions rule: + ```bash + git worktree add -b paths-audit ../-paths-audit main + cd ../-paths-audit + ``` + +2. **Audit** — run the gate to enumerate findings: + ```bash + pnpm run check:paths --json > /tmp/paths-findings.json + pnpm run check:paths --explain # human-readable + ``` + +3. **Fix loop** — for each finding, apply the matching pattern below. After each fix, re-run the gate. Stop iterating when `pnpm run check:paths` exits 0. + +4. **Verify** — run the full check suite + zizmor on any modified workflow: + ```bash + pnpm check + for w in .github/workflows/*.yml; do zizmor "$w"; done + ``` + +5. **Commit and push** — group fixes by logical category (workflows, code, Dockerfiles). Push directly to `main` for repos that allow direct push, or open a PR for repos that require it (socket-cli, socket-sdk-js, socket-registry per their CLAUDE.md / memory entries). + +## Fix patterns + +### Rule A — Multi-stage path constructed inline (in `.mts`/`.cts`) + +**Bad**: +```ts +const finalBinary = path.join(PACKAGE_ROOT, 'build', BUILD_MODE, PLATFORM_ARCH, 'out', 'Final', 'binary') +``` + +**Fix**: move the construction into the package's `scripts/paths.mts` (or `lib/paths.mts`), or use a build-infra helper: +```ts +// In packages/foo/scripts/paths.mts: +export function getBuildPaths(mode, platformArch) { + // ... constructs once ... + return { outputFinalBinary: path.join(PACKAGE_ROOT, 'build', mode, platformArch, 'out', 'Final', binaryName) } +} + +// In the consumer: +import { getBuildPaths } from './paths.mts' +const { outputFinalBinary } = getBuildPaths(mode, platformArch) +``` + +For binsuite tools (binpress/binflate/binject) the canonical helper is `getFinalBinaryPath(packageRoot, mode, platformArch, binaryName)` from `build-infra/lib/paths`. For download caches use `getDownloadedDir(packageRoot)`. + +### Rule B — Cross-package traversal + +**Bad**: +```ts +const liefDir = path.join(PACKAGE_ROOT, '..', 'lief-builder', 'build', mode, platformArch, 'out', 'Final', 'lief') +``` + +**Fix**: declare the workspace dep, expose `paths.mts` via the producer's `exports`, import the helper: + +1. In producer's `package.json`: + ```json + "exports": { + "./scripts/paths": "./scripts/paths.mts" + } + ``` +2. In consumer's `package.json` `dependencies`: + ```json + "lief-builder": "workspace:*" + ``` +3. In consumer: + ```ts + import { getBuildPaths as getLiefBuildPaths } from 'lief-builder/scripts/paths' + const { outputFinalDir } = getLiefBuildPaths(mode, platformArch) + ``` + +### Rule C — Workflow path repetition + +**Bad** (3 steps each rebuilding the same path): +```yaml +- name: Step A + run: cd packages/foo/build/${BUILD_MODE}/${PLATFORM_ARCH}/out/Final && do-thing-1 +- name: Step B + run: cd packages/foo/build/${BUILD_MODE}/${PLATFORM_ARCH}/out/Final && do-thing-2 +- name: Step C + run: cd packages/foo/build/${BUILD_MODE}/${PLATFORM_ARCH}/out/Final && do-thing-3 +``` + +**Fix**: add a "Compute paths" step early in the job that constructs the path once, expose via `$GITHUB_OUTPUT`, reference downstream: + +```yaml +- name: Compute foo paths + id: paths + env: + BUILD_MODE: ${{ steps.build-mode.outputs.mode }} + PLATFORM_ARCH: ${{ steps.platform-arch.outputs.platform_arch }} + run: | + PACKAGE_DIR="packages/foo" + PLATFORM_BUILD_DIR="${PACKAGE_DIR}/build/${BUILD_MODE}/${PLATFORM_ARCH}" + FINAL_DIR="${PLATFORM_BUILD_DIR}/out/Final" + { + echo "package_dir=${PACKAGE_DIR}" + echo "platform_build_dir=${PLATFORM_BUILD_DIR}" + echo "final_dir=${FINAL_DIR}" + } >> "$GITHUB_OUTPUT" + +- name: Step A + env: + FINAL_DIR: ${{ steps.paths.outputs.final_dir }} + run: cd "$FINAL_DIR" && do-thing-1 +# ... etc +``` + +For paths used inside `working-directory: packages/foo` steps, expose a `_rel` companion (e.g. `final_dir_rel=build/${BUILD_MODE}/${PLATFORM_ARCH}/out/Final`) and reference that. + +### Rule D — Comment-encoded paths + +**Bad**: +```yaml +# Path: packages/foo/build/dev/darwin-arm64/out/Final/binary +COPY --from=builder /build/.../out/Final/binary /out/Final/binary +``` + +**Fix**: cite the canonical `paths.mts` instead of duplicating the string: +```yaml +# Layout owned by packages/foo/scripts/paths.mts:getBuildPaths(). +COPY --from=builder /build/packages/foo/build/${BUILD_MODE}/${PLATFORM_ARCH}/out/Final/binary /out/Final/binary +``` + +The comment may describe structure (`/`) but should not be a parsable literal path. + +### Rule G — Dockerfile/Makefile/shell duplicate construction + +**Bad** (Dockerfile reconstructs the path 3 times in the same stage): +```dockerfile +RUN mkdir -p build/${BUILD_MODE}/${PLATFORM_ARCH}/out/Final && \ + cp src build/${BUILD_MODE}/${PLATFORM_ARCH}/out/Final/output && \ + ls build/${BUILD_MODE}/${PLATFORM_ARCH}/out/Final/ +``` + +**Fix**: declare an `ENV` once, reference everywhere: +```dockerfile +# Layout owned by packages/foo/scripts/paths.mts. +ENV FINAL_DIR=build/${BUILD_MODE}/${PLATFORM_ARCH}/out/Final +RUN mkdir -p "$FINAL_DIR" && cp src "$FINAL_DIR/output" && ls "$FINAL_DIR/" +``` + +Each Dockerfile `FROM` stage is its own scope — ENV from the build stage doesn't reach a subsequent `FROM scratch AS export` stage. The gate accounts for this. + +## Mode: check (read-only) + +When invoked as `/path-guard check`: + +```bash +pnpm run check:paths --explain +``` + +Print the gate's findings without making any edits. Exit 0 if clean, 1 if findings present. Useful for CI / pre-merge inspection. + +## Allowlisting a finding + +When a genuine exemption is needed (rare — most "false positives" should be reported as gate bugs), add an entry to `.github/paths-allowlist.yml`. Two ways to pin the entry to a specific site: + +- **`line:`** — exact line number. Strict; a single-line edit above shifts the entry off-target and the finding re-surfaces. +- **`snippet_hash:`** — 12-char SHA-256 prefix of the offending snippet (whitespace-normalized). Drift-resistant: survives reformatting, but any content-changing edit invalidates it. Get the hash: + ```bash + pnpm run check:paths --show-hashes + ``` + +Both may be set — either matching is sufficient. Prefer `snippet_hash` over raw `line:` when the exemption is expected to outlive routine reformatting; prefer `line:` when you specifically *want* the entry to fall off after any nearby edit. + +## Mode: install (new repo) + +When invoked as `/path-guard install` on a Socket repo that doesn't yet have the gate: + +1. Copy the gate file from this skill's reference dir: + ```bash + cp .claude/skills/path-guard/reference/check-paths.mts.tmpl scripts/check-paths.mts + ``` +2. Copy the empty allowlist: + ```bash + cp .claude/skills/path-guard/reference/paths-allowlist.yml.tmpl .github/paths-allowlist.yml + ``` +3. Add `"check:paths": "node scripts/check-paths.mts"` to `package.json`. +4. Wire `runPathHygieneCheck()` into `scripts/check.mts` (after the existing checks). +5. Append the rule snippet from `.claude/skills/_shared/path-guard-rule.md` to the repo's `CLAUDE.md` if a `1 path, 1 reference` section is missing. +6. Add the hook entry to `.claude/settings.json` `PreToolUse` matcher `Edit|Write`: + ```json + { "type": "command", "command": "node .claude/hooks/path-guard/index.mts" } + ``` +7. Run the gate against the repo. Triage findings as you would in audit-and-fix mode. + +## Tie-in with quality-scan + +The `/quality-scan` skill should call `pnpm run check:paths --json` as one of its sub-scans and surface findings as part of its A-F graded report. Failures roll into the overall quality grade. The full audit-and-fix workflow lives here; quality-scan just *detects* during periodic scans. + +## Reference patterns + +When converting a repo to the strategy, the patterns I keep reusing: + +- **TS-first packages**: each package owns a `scripts/paths.mts` with `PACKAGE_ROOT`, `BUILD_ROOT`, `getBuildPaths(mode, platformArch)` returning at minimum `outputFinalDir` and `outputFinalBinary`/`outputFinalFile`. +- **Cross-package consumers**: `package.json` `exports` whitelists `./scripts/paths`. Consumer adds `": workspace:*"` and imports. +- **Workflows**: each job has a "Compute paths" step (`id: paths`) early in the job. Step outputs include `package_dir`, `platform_build_dir`, `final_dir`, named files. `_rel` companions when `working-directory:` is used. +- **Docker stages**: each `FROM` stage declares `ENV PLATFORM_BUILD_DIR=...` and `ENV FINAL_DIR=...` once. Subsequent RUN steps reference the variables. + +The first repo (socket-btm) is the worked example. Read its `scripts/paths.mts` files and `.github/workflows/*.yml` for canonical patterns when applying the strategy elsewhere. diff --git a/.claude/skills/path-guard/reference/check-paths.mts.tmpl b/.claude/skills/path-guard/reference/check-paths.mts.tmpl new file mode 100644 index 000000000..cbecc71e5 --- /dev/null +++ b/.claude/skills/path-guard/reference/check-paths.mts.tmpl @@ -0,0 +1,947 @@ +#!/usr/bin/env node +/** + * @fileoverview Path-hygiene gate. + * + * Mantra: 1 path, 1 reference. A path is constructed exactly once; + * everywhere else references the constructed value. + * + * Whole-repo scan complementing the per-edit `.claude/hooks/path-guard` + * hook. The hook stops new violations from landing; this gate finds + * the existing ones and blocks merges that introduce more. + * + * Rules enforced: + * + * A — Multi-stage path constructed inline. A `path.join(...)` call + * (or template literal) in a `.mts`/`.cts` file outside a + * `paths.mts` that stitches together two or more "stage" + * segments (Final, Release, Stripped, Compressed, Optimized, + * Synced, wasm, downloaded), or one stage plus a build-root + * (`build`/`out`) plus a mode (`dev`/`prod`/`shared`). The + * construction belongs in the package's `paths.mts` (or a + * build-infra helper); every consumer imports the computed + * value. + * + * B — Cross-package path traversal. A `path.join(*, '..', '', 'build', ...)` reaches into a sibling's build + * output without going through its `exports`. The sibling owns + * its layout; consumers declare a workspace dep and import the + * sibling's `paths.mts`. + * + * C — Hand-built workflow path. A `.github/workflows/*.yml` step + * constructs `build/${...}/out//...` inline outside a + * canonical "Compute paths" step. Workflows can carry path + * strings, but the strings are constructed once and exposed via + * step outputs / job env that downstream steps reference. + * + * D — Comment-encoded paths. Comments (in code or YAML) that re-state + * a fully-qualified multi-stage path. Comments may describe the + * structure ("Final dir" or "build//...") but should not + * encode a complete path string that a tool would parse — the + * canonical construction IS the documentation. + * + * F — Same path constructed in multiple places. The same shape of + * multi-stage `path.join(...)` (or workflow `build/${...}/...` + * string template) appearing in two or more files. Construct + * once and import; references of the constructed value are + * unlimited. + * + * G — Hand-built paths in Makefiles, Dockerfiles, and shell scripts. + * Same shape as A, applied to executable artifacts that don't + * run TypeScript. Each canonical construction must carry a + * comment naming the source-of-truth `paths.mts` so the script + * can't drift from TS without a flagged change. + * + * Allowlist: `.github/paths-allowlist.yml`. Each entry needs a + * `reason` so the list stays audit-able. Patterns are deliberately + * narrow — entries should be specific, not blanket. + * + * Usage: + * node scripts/check-paths.mts # default: report + fail + * node scripts/check-paths.mts --explain # long-form explanation + * node scripts/check-paths.mts --json # machine-readable + * node scripts/check-paths.mts --quiet # silent on clean + * + * Exit codes: + * 0 — clean (no findings, or every finding is allowlisted) + * 1 — findings present + * 2 — gate itself crashed + */ + +import { createHash } from 'node:crypto' +import { existsSync, readFileSync, readdirSync } from 'node:fs' +import path from 'node:path' +import process from 'node:process' + +import { fileURLToPath } from 'node:url' + +import { parseArgs } from 'node:util' + +import { + BUILD_ROOT_SEGMENTS, + KNOWN_SIBLING_PACKAGES, + MODE_SEGMENTS, + STAGE_SEGMENTS, +} from '../.claude/hooks/path-guard/segments.mts' + +// Plain stderr/stdout output — no @socketsecurity/lib dependency so +// the gate is self-contained and works in socket-lib itself (which +// would otherwise import itself). +const logger = { + log: (msg: string) => process.stdout.write(msg + '\n'), + error: (msg: string) => process.stderr.write(msg + '\n'), + step: (msg: string) => process.stdout.write(`→ ${msg}\n`), + success: (msg: string) => process.stdout.write(`✔ ${msg}\n`), + substep: (msg: string) => process.stdout.write(` ${msg}\n`), +} + +const __filename = fileURLToPath(import.meta.url) +const __dirname = path.dirname(__filename) +const REPO_ROOT = path.resolve(__dirname, '..') + +// Stage / build-root / mode / sibling-package vocabularies are imported +// from `.claude/hooks/path-guard/segments.mts` (the canonical source). +// Both this gate and the path-guard hook share that single definition +// — Mantra: 1 path, 1 reference. + +// File-path patterns that legitimately enumerate path segments. +const EXEMPT_FILE_PATTERNS: RegExp[] = [ + // Any paths.mts is the canonical constructor. + /(^|\/)paths\.(mts|cts|js)$/, + // Build-infra owns shared helpers that enumerate stages. + /packages\/build-infra\/lib\/paths\.mts$/, + /packages\/build-infra\/lib\/constants\.mts$/, + // Path-scanning gates that intentionally enumerate. + /scripts\/check-paths\.mts$/, + /scripts\/check-consistency\.mts$/, + /\.claude\/hooks\/path-guard\//, + // Allowlist + config files. + /\.github\/paths-allowlist\.yml$/, +] + +type Finding = { + rule: 'A' | 'B' | 'C' | 'D' | 'F' | 'G' + file: string + line: number + snippet: string + message: string + fix: string +} + +const findings: Finding[] = [] + +const args = parseArgs({ + options: { + explain: { type: 'boolean', default: false }, + json: { type: 'boolean', default: false }, + quiet: { type: 'boolean', default: false }, + 'show-hashes': { type: 'boolean', default: false }, + }, + strict: false, +}) + +const isExempt = (filePath: string): boolean => + EXEMPT_FILE_PATTERNS.some(re => re.test(filePath)) + +// ────────────────────────────────────────────────────────────────── +// Allowlist loading +// ────────────────────────────────────────────────────────────────── + +type AllowlistEntry = { + file?: string + pattern?: string + rule?: string + line?: number + snippet_hash?: string + reason: string +} + +const loadAllowlist = (): AllowlistEntry[] => { + const allowlistPath = path.join(REPO_ROOT, '.github', 'paths-allowlist.yml') + if (!existsSync(allowlistPath)) { + return [] + } + const text = readFileSync(allowlistPath, 'utf8') + // Tiny YAML parser — only the shape we need: list of entries with + // `file`, `pattern`, `rule`, `line`, `reason` scalar fields, plus + // YAML 1.2 block-scalar indicators `|` (literal) and `>` (folded) + // for multi-line reasons. Avoids a yaml dep for a gate that has to + // be self-contained. + const entries: AllowlistEntry[] = [] + let current: Partial | null = null + // When set, subsequent more-indented lines fold into this key as a + // block scalar (literal '|' keeps newlines, folded '>' joins with + // spaces). + let blockKey: string | null = null + let blockKind: '|' | '>' | null = null + let blockIndent = 0 + let blockLines: string[] = [] + const flushBlock = () => { + if (current && blockKey) { + const value = + blockKind === '>' + ? blockLines.join(' ').replace(/\s+/g, ' ').trim() + : blockLines.join('\n').replace(/\n+$/, '') + ;(current as any)[blockKey] = value + } + blockKey = null + blockKind = null + blockLines = [] + } + const indentOf = (line: string): number => { + let i = 0 + while (i < line.length && line[i] === ' ') { + i += 1 + } + return i + } + const lines = text.split('\n') + for (let i = 0; i < lines.length; i++) { + const raw = lines[i]! + const line = raw.replace(/\r$/, '') + // Block-scalar accumulation takes precedence over normal parsing. + if (blockKey !== null) { + if (line.trim() === '') { + // Preserve blank lines inside a literal block; folded blocks + // turn them into paragraph breaks (kept as separate joins). + blockLines.push('') + continue + } + const indent = indentOf(line) + if (indent >= blockIndent) { + blockLines.push(line.slice(blockIndent)) + continue + } + flushBlock() + // Fall through and re-process the dedented line as normal. + } + if (!line.trim() || line.trim().startsWith('#')) { + continue + } + const tryAssign = (key: string, value: string) => { + const trimmed = value.trim() + if (current === null) { + return + } + if (trimmed === '|' || trimmed === '>') { + blockKey = key + blockKind = trimmed as '|' | '>' + blockIndent = indentOf(lines[i + 1] ?? '') || indentOf(line) + 2 + blockLines = [] + return + } + ;(current as any)[key] = + key === 'line' ? Number(unquote(trimmed)) : unquote(trimmed) + } + if (line.startsWith('- ')) { + if (current && current.reason) { + entries.push(current as AllowlistEntry) + } + current = {} + const rest = line.slice(2).trim() + if (rest) { + const m = rest.match(/^([\w-]+):\s*(.*)$/) + if (m) { + tryAssign(m[1]!, m[2]!) + } + } + } else if (current) { + const m = line.match(/^\s+([\w-]+):\s*(.*)$/) + if (m) { + tryAssign(m[1]!, m[2]!) + } + } + } + if (blockKey !== null) { + flushBlock() + } + if (current && current.reason) { + entries.push(current as AllowlistEntry) + } + return entries +} + +const unquote = (s: string): string => { + const t = s.trim() + if ( + (t.startsWith('"') && t.endsWith('"')) || + (t.startsWith("'") && t.endsWith("'")) + ) { + return t.slice(1, -1) + } + return t +} + +const ALLOWLIST = loadAllowlist() + +/** + * Stable, normalized snippet hash. Whitespace-insensitive so trivial + * reformatting (indent change, trailing comma, line wrap) doesn't + * invalidate an allowlist entry, but content-changing edits do. The + * hash exposes only the first 12 hex chars (~48 bits) which is plenty + * for collision-resistance within a single repo's finding set and + * keeps the YAML readable. + */ +const snippetHash = (snippet: string): string => { + const normalized = snippet.replace(/\s+/g, ' ').trim() + return createHash('sha256').update(normalized).digest('hex').slice(0, 12) +} + +/** + * Allowlist matching trades off two failure modes: + * + * - Drift via reformatting (a line shift breaks an entry, the + * finding re-surfaces, devs paper over with a new entry). + * - Stealth allowlisting (an entry pinned to "anywhere in this file" + * silently exempts unrelated future violations). + * + * Strategy: exact line match OR `snippet_hash` match (whitespace- + * normalized SHA-256, first 12 hex). Either is sufficient. Lines stay + * exact (was ±2; the slack let reformatting silently slide), and + * `snippet_hash` provides reformatting-tolerant matching that's still + * tied to the literal text — paste-and-edit cheating would change the + * hash. If neither `line` nor `snippet_hash` is provided, the entry + * matches purely by `rule` + `file` + `pattern` (file-level exempt; + * use sparingly and always pair with a precise `pattern`). + */ +const isAllowlisted = (finding: Finding): boolean => + ALLOWLIST.some(entry => { + if (entry.rule && entry.rule !== finding.rule) { + return false + } + if (entry.file && !finding.file.includes(entry.file)) { + return false + } + if (entry.pattern && !finding.snippet.includes(entry.pattern)) { + return false + } + const lineProvided = entry.line !== undefined + const hashProvided = + typeof entry.snippet_hash === 'string' && entry.snippet_hash.length > 0 + if (lineProvided || hashProvided) { + const lineMatches = lineProvided && entry.line === finding.line + const hashMatches = + hashProvided && entry.snippet_hash === snippetHash(finding.snippet) + if (!(lineMatches || hashMatches)) { + return false + } + } + return true + }) + +// ────────────────────────────────────────────────────────────────── +// File walking +// ────────────────────────────────────────────────────────────────── + +const SKIP_DIRS = new Set([ + '.git', + 'node_modules', + 'build', + 'dist', + 'out', + 'target', + '.cache', + 'upstream', +]) + +const walk = function* ( + dir: string, + filter: (relPath: string) => boolean, +): Generator { + let entries + try { + entries = readdirSync(dir, { withFileTypes: true }) + } catch { + return + } + for (const e of entries) { + if (SKIP_DIRS.has(e.name)) { + continue + } + const full = path.join(dir, e.name) + const rel = path.relative(REPO_ROOT, full) + if (e.isDirectory()) { + yield* walk(full, filter) + } else if (e.isFile() && filter(rel)) { + yield rel + } + } +} + +// ────────────────────────────────────────────────────────────────── +// Rule A + B: code scan (.mts / .cts) +// ────────────────────────────────────────────────────────────────── + +// Locate `path.join(` or `path.resolve(` call sites; argument-list +// extraction uses a paren-balancing scanner below to handle arbitrary +// nesting depth (the previous regex-only approach silently missed any +// argument containing 2+ levels of nested function calls). +const PATH_CALL_RE = /\bpath\.(?:join|resolve)\s*\(/g +const STRING_LITERAL_RE = /(['"])((?:\\.|(?!\1)[^\\])*)\1/g + +// Template literal scanner. Captures backtick-delimited strings +// (including those with `${...}` placeholders) so Rule A also catches +// path construction via template literals like +// `${buildDir}/out/Final/${binary}` or `build/${mode}/out/Final`. +const TEMPLATE_LITERAL_RE = + /`((?:\\.|(?:\$\{(?:[^{}]|\{[^{}]*\})*\})|(?!`)[^\\])*)`/g + +/** + * Convert a template-literal body into a synthetic forward-slash path + * by replacing `${...}` placeholders with a sentinel and normalizing + * separators. Returns the sequence of path segments split on `/`. The + * sentinel doesn't match any STAGE/BUILD_ROOT/MODE token, so a + * placeholder-only segment (`${binaryName}`) won't match those sets. + */ +const templateLiteralSegments = (body: string): string[] => { + // Strip placeholders so they don't introduce noise in segments. + // Empty result for a placeholder is fine; downstream filters by set + // membership and skips empties. + const stripped = body.replace(/\$\{(?:[^{}]|\{[^{}]*\})*\}/g, '\x00') + return stripped.split('/').filter(seg => seg.length > 0 && seg !== '\x00') +} + +/** + * Extract every `path.join(...)` and `path.resolve(...)` call from the + * source text, returning each call's literal start offset and argument + * substring. Uses paren-balancing so deeply-nested arguments like + * `path.join(getDir(child(x)), 'build', 'Final')` are captured fully. + */ +const extractPathCalls = ( + source: string, +): Array<{ offset: number; args: string }> => { + const calls: Array<{ offset: number; args: string }> = [] + PATH_CALL_RE.lastIndex = 0 + let match: RegExpExecArray | null + while ((match = PATH_CALL_RE.exec(source)) !== null) { + const callStart = match.index + const argsStart = PATH_CALL_RE.lastIndex + let depth = 1 + let i = argsStart + let inString: '"' | "'" | '`' | null = null + while (i < source.length && depth > 0) { + const ch = source[i]! + if (inString) { + if (ch === '\\') { + i += 2 + continue + } + if (ch === inString) { + inString = null + } + } else { + if (ch === '"' || ch === "'" || ch === '`') { + inString = ch + } else if (ch === '(') { + depth += 1 + } else if (ch === ')') { + depth -= 1 + if (depth === 0) { + break + } + } + } + i += 1 + } + if (depth === 0) { + calls.push({ offset: callStart, args: source.slice(argsStart, i) }) + PATH_CALL_RE.lastIndex = i + 1 + } + } + return calls +} + +const extractStringLiterals = (args: string): string[] => { + const literals: string[] = [] + let match: RegExpExecArray | null + STRING_LITERAL_RE.lastIndex = 0 + while ((match = STRING_LITERAL_RE.exec(args)) !== null) { + if (match[2] !== undefined) { + literals.push(match[2]) + } + } + return literals +} + +const scanCodeFile = (relPath: string): void => { + const full = path.join(REPO_ROOT, relPath) + let content: string + try { + content = readFileSync(full, 'utf8') + } catch { + return + } + const lines = content.split('\n') + // Build a line-offset map so we can map regex offsets back to line + // numbers cheaply. + const lineOffsets: number[] = [0] + for (let i = 0; i < content.length; i++) { + if (content[i] === '\n') { + lineOffsets.push(i + 1) + } + } + const offsetToLine = (offset: number): number => { + let lo = 0 + let hi = lineOffsets.length - 1 + while (lo < hi) { + const mid = (lo + hi + 1) >>> 1 + if (lineOffsets[mid]! <= offset) { + lo = mid + } else { + hi = mid - 1 + } + } + return lo + 1 + } + + for (const call of extractPathCalls(content)) { + const literals = extractStringLiterals(call.args) + const stages = literals.filter(l => STAGE_SEGMENTS.has(l)) + const buildRoots = literals.filter(l => BUILD_ROOT_SEGMENTS.has(l)) + const modes = literals.filter(l => MODE_SEGMENTS.has(l)) + + // Rule A: 2+ stages OR (1 stage + 1 build-root + 1 mode). + const triggersA = + stages.length >= 2 || + (stages.length >= 1 && buildRoots.length >= 1 && modes.length >= 1) + if (triggersA) { + const line = offsetToLine(call.offset) + const snippet = (lines[line - 1] ?? '').trim() + findings.push({ + rule: 'A', + file: relPath, + line, + snippet, + message: 'Multi-stage path constructed inline (outside paths.mts).', + fix: 'Construct in the owning paths.mts (or use getFinalBinaryPath / getDownloadedDir from build-infra/lib/paths). Import the computed value here.', + }) + } + + // Rule B: each '..' opens a window; the window stays open only + // until the next non-'..' literal. A sibling-package literal + // *immediately after* a '..' (no path segment between them) + // triggers, AND there must be build context elsewhere in the + // call. Resetting per-segment prevents false positives where '..' + // appears earlier and sibling-name appears much later in an + // unrelated position. + const hasBuildContext = literals.some( + l => BUILD_ROOT_SEGMENTS.has(l) || STAGE_SEGMENTS.has(l), + ) + if (hasBuildContext) { + for (let i = 0; i < literals.length - 1; i++) { + if ( + literals[i] === '..' && + KNOWN_SIBLING_PACKAGES.has(literals[i + 1]!) + ) { + const sibling = literals[i + 1]! + const line = offsetToLine(call.offset) + const snippet = (lines[line - 1] ?? '').trim() + findings.push({ + rule: 'B', + file: relPath, + line, + snippet, + message: `Cross-package traversal into '${sibling}' build output.`, + fix: `Add '${sibling}: workspace:*' as a dep, declare an exports entry on '${sibling}' (e.g. './scripts/paths' → './scripts/paths.mts'), and import the path from there.`, + }) + break + } + } + } + } + + // Rule A (template literal variant). Backtick strings like + // `${buildDir}/out/Final/${binary}` or `build/${mode}/${arch}/out/Final` + // construct paths the same way `path.join(...)` does — flag the + // same shapes. Skip raw imports / template tag positions by + // filtering out leading `import.meta.url`-style / tag positions + // implicitly: TEMPLATE_LITERAL_RE matches any backtick string and + // we rely on segment composition to decide if it's a path. + TEMPLATE_LITERAL_RE.lastIndex = 0 + let tmpl: RegExpExecArray | null + while ((tmpl = TEMPLATE_LITERAL_RE.exec(content)) !== null) { + const body = tmpl[1] ?? '' + if (!body.includes('/')) { + continue + } + const segments = templateLiteralSegments(body) + const stages = segments.filter(s => STAGE_SEGMENTS.has(s)) + const buildRoots = segments.filter(s => BUILD_ROOT_SEGMENTS.has(s)) + const modes = segments.filter(s => MODE_SEGMENTS.has(s)) + // Template literal trigger is tighter than path.join() because + // backtick strings often appear in patch fixtures, error messages, + // and other multi-line content that incidentally contains stage + // tokens like `wasm`. Require the canonical build-output shape: + // - 'build' + 'out' + stage (canonical multi-stage layout), OR + // - 2+ stage segments AND 'out' (e.g. `wasm/out/Final`), OR + // - 'build' + stage + literal mode (back-compat with path.join). + const hasBuildAndOut = + buildRoots.includes('build') && buildRoots.includes('out') + const hasOut = buildRoots.includes('out') + const hasBuild = buildRoots.includes('build') + const triggersA = + (hasBuildAndOut && stages.length >= 1) || + (stages.length >= 2 && hasOut) || + (hasBuild && stages.length >= 1 && modes.length >= 1) + if (triggersA) { + const line = offsetToLine(tmpl.index) + const snippet = (lines[line - 1] ?? '').trim() + findings.push({ + rule: 'A', + file: relPath, + line, + snippet, + message: + 'Multi-stage path constructed inline via template literal (outside paths.mts).', + fix: 'Construct in the owning paths.mts (or use getFinalBinaryPath / getDownloadedDir from build-infra/lib/paths). Import the computed value here.', + }) + } + } +} + +// ────────────────────────────────────────────────────────────────── +// Rule C + D: workflow YAML scan +// ────────────────────────────────────────────────────────────────── + +const WORKFLOW_PATH_RE = + /build\/\$\{[^}]+\}\/[^"'`\s]*\/out\/(?:Final|Release|Stripped|Compressed|Optimized|Synced)/g +const WORKFLOW_GH_EXPR_PATH_RE = + /build\/\$\{\{\s*[^}]+\}\}\/[^"'`\s]*\/out\/(?:Final|Release|Stripped|Compressed|Optimized|Synced)/g + +const isInsideComputePathsBlock = ( + lines: string[], + lineIdx: number, +): boolean => { + // Walk backwards up to 60 lines looking for the start of the + // current step. If that step is a "Compute paths" step, the line + // is exempt. + for (let i = lineIdx; i >= Math.max(0, lineIdx - 60); i--) { + const l = lines[i] ?? '' + if (/^\s*-\s*name:/i.test(l)) { + // Step boundary — check if THIS step is a Compute paths step. + // The step body may include `id: paths` even if the name is + // something else (e.g. `id: stub-paths`), so look at the next + // ~20 lines for either marker. + for (let j = i; j < Math.min(lines.length, i + 20); j++) { + const m = lines[j] ?? '' + if ( + /^\s*-\s*name:\s*Compute\s+[\w-]+\s+paths/i.test(m) || + /^\s*id:\s*[\w-]*paths\s*$/i.test(m) + ) { + return true + } + if (j > i && /^\s*-\s*name:/i.test(m)) { + // Hit the next step — current step is NOT Compute paths. + return false + } + } + return false + } + } + return false +} + +const scanWorkflowFile = (relPath: string): void => { + const full = path.join(REPO_ROOT, relPath) + let content: string + try { + content = readFileSync(full, 'utf8') + } catch { + return + } + const lines = content.split('\n') + + // First pass: collect every hand-built path occurrence outside a + // "Compute paths" step. Per the mantra, a single reference is fine + // — what's banned is reconstructing the same path 2+ times. + type PathHit = { + line: number + snippet: string + pathStr: string + } + const occurrences = new Map() + + for (let i = 0; i < lines.length; i++) { + const line = lines[i]! + if (/^\s*#/.test(line)) { + // Skip comment lines from C scan; they're under D below. + continue + } + if (isInsideComputePathsBlock(lines, i)) { + // Inside the canonical construction step — exempt. + continue + } + WORKFLOW_PATH_RE.lastIndex = 0 + WORKFLOW_GH_EXPR_PATH_RE.lastIndex = 0 + const matches: string[] = [] + let m: RegExpExecArray | null + while ((m = WORKFLOW_PATH_RE.exec(line)) !== null) { + matches.push(m[0]) + } + while ((m = WORKFLOW_GH_EXPR_PATH_RE.exec(line)) !== null) { + matches.push(m[0]) + } + for (const pathStr of matches) { + const list = occurrences.get(pathStr) ?? [] + list.push({ line: i + 1, snippet: line.trim(), pathStr }) + occurrences.set(pathStr, list) + } + } + + // Flag every occurrence of a shape that appears 2+ times. + for (const [pathStr, hits] of occurrences) { + if (hits.length < 2) { + continue + } + for (const hit of hits) { + findings.push({ + rule: 'C', + file: relPath, + line: hit.line, + snippet: hit.snippet, + message: `Workflow constructs the same path ${hits.length} times: ${pathStr}`, + fix: 'Add a "Compute paths" step (id: paths) early in the job that computes this path ONCE and exposes it via $GITHUB_OUTPUT. Reference as ${{ steps.paths.outputs. }} in subsequent steps. References of the constructed value are unlimited; reconstructing is the violation.', + }) + } + } + + // Rule D: comments encoding a fully-qualified multi-stage path + // (separate scan since it has different semantics). + for (let i = 0; i < lines.length; i++) { + const line = lines[i]! + if (!/^\s*#/.test(line)) { + continue + } + const literalShape = + /build\/(?:dev|prod|shared)\/[a-z0-9-]+\/(?:wasm\/)?out\/(?:Final|Release|Stripped|Compressed|Optimized|Synced)/i + if (literalShape.test(line)) { + findings.push({ + rule: 'D', + file: relPath, + line: i + 1, + snippet: line.trim(), + message: 'Comment encodes a fully-qualified path string.', + fix: 'Cite the canonical paths.mts (e.g. "see packages//scripts/paths.mts:getBuildPaths()") instead of duplicating the path string. Comments may describe structure with placeholders ("/") but should not be a parsable path.', + }) + } + } +} + +// ────────────────────────────────────────────────────────────────── +// Rule G: Makefile / Dockerfile / shell scan +// ────────────────────────────────────────────────────────────────── + +const SCRIPT_HAND_BUILT_RE = + /build\/\$?\{?(?:BUILD_MODE|MODE|prod|dev)\}?\/[\w${}.-]*\/out\/(?:Final|Release|Stripped|Compressed|Optimized|Synced)/g + +const scanScriptFile = (relPath: string): void => { + const full = path.join(REPO_ROOT, relPath) + let content: string + try { + content = readFileSync(full, 'utf8') + } catch { + return + } + const lines = content.split('\n') + const isDockerfile = + /Dockerfile/i.test(relPath) || /\.glibc$|\.musl$/.test(relPath) + + // First pass: collect every multi-stage path occurrence in this file, + // scoped per Dockerfile stage (each `FROM ... AS ...` starts a new + // scope where ENV/ARG don't propagate). + type Hit = { line: number; text: string; pathStr: string; stage: number } + const hits: Hit[] = [] + let stage = 0 + for (let i = 0; i < lines.length; i++) { + const line = lines[i]! + if (/^\s*#/.test(line)) { + // Skip comments — documentation, not construction. + continue + } + if (isDockerfile && /^FROM\s+/i.test(line)) { + stage += 1 + continue + } + SCRIPT_HAND_BUILT_RE.lastIndex = 0 + let m: RegExpExecArray | null + while ((m = SCRIPT_HAND_BUILT_RE.exec(line)) !== null) { + hits.push({ + line: i + 1, + text: line.trim(), + pathStr: m[0], + stage, + }) + } + } + + // Group by (stage, pathStr) — only flag when a path is built 2+ + // times within the SAME Dockerfile stage (or anywhere in non- + // Dockerfile scripts, where stages don't apply). + const grouped = new Map() + for (const h of hits) { + const key = `${h.stage}::${h.pathStr}` + const list = grouped.get(key) ?? [] + list.push(h) + grouped.set(key, list) + } + for (const [, list] of grouped) { + if (list.length < 2) { + continue + } + for (const hit of list) { + findings.push({ + rule: 'G', + file: relPath, + line: hit.line, + snippet: hit.text, + message: `Hand-built multi-stage path constructed ${list.length} times in this file: ${hit.pathStr}`, + fix: 'Assign to a variable / ENV once near the top of the script / Dockerfile stage, with a comment naming the canonical paths.mts. Reference the variable everywhere downstream. References of a single construction are unlimited; reconstructing the same path is the violation.', + }) + } + } +} + +// ────────────────────────────────────────────────────────────────── +// Rule F: cross-file path repetition +// ────────────────────────────────────────────────────────────────── + +const checkRuleF = (): void => { + // A path is "constructed" each time we see a new path.join with a + // matching shape. Group findings of Rule A by their snippet shape; + // when the same shape appears in 2+ files, demote them to Rule F so + // the message is more accurate. + const byShape = new Map() + for (const f of findings) { + if (f.rule !== 'A') { + continue + } + // Normalize: strip whitespace, identifiers, surrounding context; + // keep just the literal path-segment shape. + const literalsRe = /'[^']*'|"[^"]*"/g + const literals = (f.snippet.match(literalsRe) ?? []).join(',') + if (!literals) { + continue + } + const list = byShape.get(literals) ?? [] + list.push(f) + byShape.set(literals, list) + } + for (const [shape, list] of byShape) { + if (list.length < 2) { + continue + } + // Promote each Rule-A finding in this group to Rule F so the + // message tells the reader the issue is cross-file repetition, + // not just a single hand-build. + for (const f of list) { + f.rule = 'F' + f.message = `Same path shape constructed in ${list.length} places: ${shape.slice(0, 100)}` + f.fix = + 'Construct this path ONCE in a paths.mts (or build-infra helper) and import the computed value. References of the computed variable are unlimited; re-constructing the same shape twice is the violation.' + } + } +} + +// ────────────────────────────────────────────────────────────────── +// Main +// ────────────────────────────────────────────────────────────────── + +const main = (): number => { + // Scan code files (Rule A + B). + for (const rel of walk( + REPO_ROOT, + p => p.endsWith('.mts') || p.endsWith('.cts'), + )) { + if (isExempt(rel)) { + continue + } + scanCodeFile(rel) + } + // Scan workflows (Rule C + D). + const workflowDir = path.join(REPO_ROOT, '.github', 'workflows') + if (existsSync(workflowDir)) { + for (const rel of walk(workflowDir, p => p.endsWith('.yml'))) { + if (isExempt(rel)) { + continue + } + scanWorkflowFile(rel) + } + } + // Scan scripts/Makefiles/Dockerfiles (Rule G). + for (const rel of walk(REPO_ROOT, p => { + const base = path.basename(p) + return ( + base === 'Makefile' || + base.endsWith('.mk') || + base.endsWith('.Dockerfile') || + base === 'Dockerfile' || + base.endsWith('.glibc') || + base.endsWith('.musl') || + (base.endsWith('.sh') && !p.includes('test/')) + ) + })) { + if (isExempt(rel)) { + continue + } + scanScriptFile(rel) + } + // Promote cross-file Rule-A repeats to Rule F. + checkRuleF() + + // Filter against allowlist. + const blocking = findings.filter(f => !isAllowlisted(f)) + + if (args.values.json) { + process.stdout.write( + JSON.stringify( + { findings: blocking, allowlisted: findings.length - blocking.length }, + null, + 2, + ) + '\n', + ) + return blocking.length === 0 ? 0 : 1 + } + + if (blocking.length === 0) { + if (!args.values.quiet) { + logger.success('Path-hygiene check passed (1 path, 1 reference)') + if (findings.length > 0) { + logger.substep(`${findings.length} finding(s) allowlisted`) + } + } + return 0 + } + + logger.error(`Path-hygiene check FAILED — ${blocking.length} finding(s)`) + logger.log('') + logger.log('Mantra: 1 path, 1 reference') + logger.log('') + for (const f of blocking) { + logger.log(` [${f.rule}] ${f.file}:${f.line}`) + logger.log(` ${f.snippet}`) + logger.log(` → ${f.message}`) + if (args.values['show-hashes']) { + logger.log(` snippet_hash: ${snippetHash(f.snippet)}`) + } + if (args.values.explain) { + logger.log(` Fix: ${f.fix}`) + } + logger.log('') + } + if (!args.values.explain) { + logger.log('Run with --explain to see fix suggestions per finding.') + logger.log( + 'Add intentional exceptions to .github/paths-allowlist.yml with a `reason` field.', + ) + logger.log( + 'Run with --show-hashes to print the snippet_hash for each finding (drift-resistant allowlisting).', + ) + } + return 1 +} + +try { + process.exitCode = main() +} catch (e) { + logger.error(`Path-hygiene gate crashed: ${e}`) + process.exitCode = 2 +} diff --git a/.claude/skills/path-guard/reference/claude-md-rule.md b/.claude/skills/path-guard/reference/claude-md-rule.md new file mode 100644 index 000000000..3e32b1bae --- /dev/null +++ b/.claude/skills/path-guard/reference/claude-md-rule.md @@ -0,0 +1,29 @@ + + +## 1 path, 1 reference + +**A path is *constructed* exactly once. Everywhere else *references* the constructed value.** + +Referencing a single computed path many times is fine — that's the whole point of computing it once. What's banned is *re-constructing* the same path in multiple places, because that's where drift is born. + +Three concrete shapes: + +1. **Within a package** — every script, test, and lib file that needs a build path imports it from the package's `scripts/paths.mts` (or `lib/paths.mts`). No `path.join('build', mode, ...)` outside that module. + +2. **Across packages** — when package B consumes package A's output, B imports A's `paths.mts` via the workspace `exports` field. Never `path.join(PKG, '..', '', 'build', ...)`. The R28 yoga/ink bug — ink hand-building yoga's wasm path and missing the `wasm/` segment — is the canonical failure mode this rule prevents. + +3. **Workflows, Dockerfiles, shell scripts** — they can't `import` TS, so they construct the string once and reference it everywhere downstream. Workflows: a "Compute paths" step exposes `steps.paths.outputs.final_dir`; later steps read `${{ steps.paths.outputs.final_dir }}`. Dockerfiles/shell: assign once to a variable / `ENV`, reference by name thereafter. Each canonical construction carries a comment naming the source-of-truth `paths.mts`. **Re-building** the same path in a second step is the violation, not referring to the constructed value many times. + +Comments may describe path *structure* with placeholders ("`/`" or "`${BUILD_MODE}/${PLATFORM_ARCH}`") but should not encode a complete literal path string. Code execution takes priority over docs: violations in `.mts`/`.cts`, Makefiles, Dockerfiles, workflow YAML, and shell scripts are blocking. README and doc-comment violations are advisory unless they contain a fully-qualified path with no parametric placeholders. + +### Three-level enforcement + +- **Hook** — `.claude/hooks/path-guard/` blocks `Edit`/`Write` calls that would introduce a violation in a `.mts`/`.cts` file. Refusal at edit time stops new duplication from landing. +- **Gate** — `scripts/check-paths.mts` runs in `pnpm check`. Fails the build on any violation that isn't allowlisted in `.github/paths-allowlist.yml`. +- **Skill** — `/path-guard` audits the repo and fixes findings; `/path-guard check` reports only; `/path-guard install` drops the gate + hook + rule into a fresh repo. + +The mantra is intentionally short so it sticks: **1 path, 1 reference**. When in doubt, find the canonical owner and import from it. diff --git a/.claude/skills/path-guard/reference/paths-allowlist.yml.tmpl b/.claude/skills/path-guard/reference/paths-allowlist.yml.tmpl new file mode 100644 index 000000000..e2746660c --- /dev/null +++ b/.claude/skills/path-guard/reference/paths-allowlist.yml.tmpl @@ -0,0 +1,28 @@ +# Path-hygiene gate allowlist. +# Mantra: 1 path, 1 reference. +# +# Each entry exempts a specific finding from `scripts/check-paths.mts`. +# Entries MUST carry a `reason` so the list stays audit-able and +# entries can be removed when the underlying code changes. +# +# Schema (all top-level keys optional except `reason`): +# +# - rule: Rule letter (A, B, C, D, F, G). Omit to match any rule. +# file: Substring match against the relative file path. +# pattern: Substring match against the offending snippet. +# line: Line number; matches if within ±2 of the finding. +# reason: Why this site is genuinely exempt. Required. +# +# Prefer narrow entries (rule + file + line + pattern) over blanket +# `file:` entries that exempt the whole file. Genuine exemptions are +# rare — most "false positives" should be reported as gate bugs. +# +# Example: +# +# - rule: A +# file: packages/foo/scripts/legacy-build.mts +# line: 42 +# pattern: "path.join(testDir, 'out', 'Final')" +# reason: | +# legacy-build.mts is scheduled for removal in v2.0; refactoring +# its path construction now would conflict with the rewrite. diff --git a/.claude/skills/programmatic-claude-lockdown/SKILL.md b/.claude/skills/programmatic-claude-lockdown/SKILL.md new file mode 100644 index 000000000..f2561013a --- /dev/null +++ b/.claude/skills/programmatic-claude-lockdown/SKILL.md @@ -0,0 +1,84 @@ +--- +name: programmatic-claude-lockdown +description: Reference for locking down programmatic Claude invocations (the `claude` CLI in workflows/scripts, the `@anthropic-ai/claude-agent-sdk` `query()` in code). Loads on demand when writing or reviewing any callsite that runs Claude programmatically. Source: https://code.claude.com/docs/en/agent-sdk/permissions. +user-invocable: false +allowed-tools: Read, Grep, Glob +--- + +# Programmatic Claude lockdown + +**Rule:** every programmatic Claude callsite sets four flags. Skip any one and a future edit silently widens the surface. + +## The four flags + +| Layer | SDK option | CLI flag | What it does | +|---|---|---|---| +| Definition | `tools` | `--tools` | Base set the model is told about. Tools not listed are invisible — no `tool_use` block possible. | +| Auto-approve | `allowedTools` | `--allowedTools` | Step 4. Listed tools run without invoking `canUseTool`. | +| Deny | `disallowedTools` | `--disallowedTools` | Step 2. Wins even against `bypassPermissions`. Defense-in-depth. | +| Mode | `permissionMode: 'dontAsk'` | `--permission-mode dontAsk` | Step 3. Unmatched tools denied without falling through to a missing `canUseTool`. | + +The official permission flow (1) hooks → (2) deny rules → (3) permission mode → (4) allow rules → (5) `canUseTool`. In `dontAsk` mode step 5 is skipped — denied. The doc states verbatim: *"`allowedTools` and `disallowedTools` ... control whether a tool call is approved, not whether the tool is available."* Availability is `tools`. + +## Recipe — read-only agent (audit, classify, summarize) + +```ts +import { query } from '@anthropic-ai/claude-agent-sdk' + +query({ + prompt: '...', + options: { + tools: ['Read', 'Grep', 'Glob'], + allowedTools: ['Read', 'Grep', 'Glob'], + disallowedTools: ['Agent', 'Bash', 'Edit', 'NotebookEdit', 'Task', 'WebFetch', 'WebSearch', 'Write'], + permissionMode: 'dontAsk', + }, +}) +``` + +CLI form for workflow YAML / shell scripts: + +```yaml +claude --print \ + --tools "Read" "Grep" "Glob" \ + --allowedTools "Read" "Grep" "Glob" \ + --disallowedTools "Agent" "Bash" "Edit" "NotebookEdit" "Task" "WebFetch" "WebSearch" "Write" \ + --permission-mode dontAsk \ + --model "$MODEL" \ + --max-turns 25 \ + "" +``` + +## Recipe — agent that needs Bash (e.g. `/updating`: pnpm + git + jq) + +Narrow `Bash(...)` patterns surgically. Block dangerous Bash patterns explicitly. Fleet rules: no `npx`/`pnpm dlx`/`yarn dlx`; no `curl`/`wget` exfil; no destructive `rm -rf`; no `sudo`. Build the deny list as shell vars so the npx/dlx denials can carry the `# zizmor:` exemption marker (the pre-commit `scanNpxDlx` hook treats those literal strings as the prohibited tools, not as exemptions, unless the line is tagged): + +```yaml +DISALLOW_BASE='Agent Task NotebookEdit WebFetch WebSearch Bash(curl:*) Bash(wget:*) Bash(rm -rf*) Bash(sudo:*)' +DISALLOW_PKG_EXEC='Bash(npx:*) Bash(pnpm dlx:*) Bash(yarn dlx:*)' # zizmor: documentation-prohibition +claude --print \ + --tools "Bash" "Read" "Write" "Edit" "Glob" "Grep" \ + --allowedTools "Bash(pnpm:*)" "Bash(git:*)" "Bash(jq:*)" "Read" "Write" "Edit" "Glob" "Grep" \ + --disallowedTools $DISALLOW_BASE $DISALLOW_PKG_EXEC \ + --permission-mode dontAsk \ + --model "$MODEL" --max-turns 25 \ + "" +``` + +## Never + +- ❌ `permissionMode: 'default'` in headless contexts — falls through to a missing `canUseTool`. Behavior undefined. +- ❌ `permissionMode: 'bypassPermissions'` / `allowDangerouslySkipPermissions: true`. +- ❌ Omitting `tools` — SDK default is the full claude_code preset. +- ❌ `Agent` / `Task` permitted — sub-agents inherit modes and can escape per-subagent restrictions when the parent is `bypassPermissions`/`acceptEdits`/`auto`. + +## Reference implementation + +`socket-lib/tools/prim/src/disambiguate.mts` — canonical SDK-form callsite. The file header documents each flag against the eval-flow step it enforces. + +`socket-lib/tools/prim/test/disambiguate.test.mts` — source-text guards that fail the build if `BASE_TOOLS` widens, if `tools: BASE_TOOLS` is unwired, if `permissionMode` drifts from `'dontAsk'`, or if `bypassPermissions` / `allowDangerouslySkipPermissions: true` ever appears. Mirror this pattern in any new callsite. + +## Existing fleet callsites + +- `socket-registry/.github/workflows/weekly-update.yml` — two `claude --print` invocations (run `/updating` skill, fix test failures). Bash recipe above. +- `socket-lib/tools/prim/src/disambiguate.mts` — read-only recipe above (`query()` SDK form). diff --git a/.claude/skills/promise-race-pitfall/SKILL.md b/.claude/skills/promise-race-pitfall/SKILL.md new file mode 100644 index 000000000..d38f3c2a8 --- /dev/null +++ b/.claude/skills/promise-race-pitfall/SKILL.md @@ -0,0 +1,57 @@ +--- +name: promise-race-pitfall +description: Reference for the `Promise.race` cross-iteration handler-leak bug. Loads on demand when writing or reviewing concurrency code that uses `Promise.race`, `Promise.any`, or hand-rolled concurrency limiters. +--- + +# Promise.race in loops — the handler-leak pitfall + +**Never re-race the same pool of promises across loop iterations.** Each call to `Promise.race([A, B, …])` attaches fresh `.then` handlers to every arm. A promise that survives N iterations accumulates N handler sets. See [nodejs/node#17469](https://github.com/nodejs/node/issues/17469) and [`@watchable/unpromise`](https://github.com/watchable/unpromise). + +## Patterns + +- **Safe** — both arms created per call: + + ```ts + const value = await Promise.race([ + fetchSomething(), + new Promise((_, r) => setTimeout(() => r(new Error('timeout')), 5000)), + ]) + ``` + +- **Leaky** — `pool` survives across iterations, accumulating handlers: + + ```ts + while (queue.length) { + const winner = await Promise.race(pool) // ← N handlers per arm by iteration N + pool = pool.filter(p => p !== winner) + } + ``` + + Same hazard for `Promise.any` and any long-lived arm such as an interrupt signal. + +## The fix + +Use a single-waiter "slot available" signal. Each task's `.then` resolves a one-shot `promiseWithResolvers` that the loop awaits, then replaces. No persistent pool, nothing to stack. + +```ts +let signal = Promise.withResolvers() +function startTask(task: Task) { + task.run().then(() => { + const prev = signal + signal = Promise.withResolvers() + prev.resolve(task) + }) +} +while (queue.length) { + // launch up to N tasks + while (running < N && queue.length) startTask(queue.shift()!) + const finished = await signal.promise + running -= 1 +} +``` + +The arm being awaited is *always fresh*; nothing accumulates handlers. + +## Quick check + +Before merging concurrency code, ask: *does any arm of a `Promise.race`/`Promise.any` outlive the call?* If yes, refactor to the single-waiter signal. diff --git a/.claude/skills/security-scan/SKILL.md b/.claude/skills/security-scan/SKILL.md index 7f2fd77e8..10a3ac3f2 100644 --- a/.claude/skills/security-scan/SKILL.md +++ b/.claude/skills/security-scan/SKILL.md @@ -2,6 +2,7 @@ name: security-scan description: Runs a multi-tool security scan — AgentShield for Claude config, zizmor for GitHub Actions, and optionally Socket CLI for dependency scanning. Produces an A-F graded security report. Use after modifying `.claude/` config, hooks, agents, or GitHub Actions workflows, and before releases. user-invocable: true +allowed-tools: Task, Read, Bash(pnpm exec agentshield:*), Bash(zizmor:*), Bash(command -v:*), Bash(find .cache/external-tools/zizmor:*) --- # Security Scan diff --git a/.github/paths-allowlist.yml b/.github/paths-allowlist.yml new file mode 100644 index 000000000..11ce3257f --- /dev/null +++ b/.github/paths-allowlist.yml @@ -0,0 +1,44 @@ +# Path-hygiene gate allowlist. +# Mantra: 1 path, 1 reference. +# +# Each entry exempts a specific finding from `scripts/check-paths.mts`. +# Entries MUST carry a `reason` so the list stays audit-able and +# entries can be removed when the underlying code changes. +# +# Schema (all top-level keys optional except `reason`): +# +# - rule: Rule letter (A, B, C, D, F, G). Omit to match any rule. +# file: Substring match against the relative file path. +# pattern: Substring match against the offending snippet. +# line: Exact line number. Strict — no fuzz tolerance. +# snippet_hash: 12-char SHA-256 prefix of the normalized snippet +# (whitespace collapsed). Drift-resistant: the entry +# keeps matching after reformatting that doesn't +# change the offending construction. Get the hash by +# running `node scripts/check-paths.mts --show-hashes`. +# reason: Why this site is genuinely exempt. Required. +# +# Match policy: if `line` is provided it must match exactly. If +# `snippet_hash` is provided it must match exactly. Both may be set — +# either one matching is sufficient (so a code reformat that keeps +# the snippet but moves the line still matches via hash, and a +# reformat that changes the snippet but keeps the line still matches +# via line). If neither is set, `file` + `pattern` + `rule` matching +# is used (broader; prefer narrow entries when possible). +# +# Prefer narrow entries (rule + file + snippet_hash + pattern) over +# blanket `file:` entries that exempt the whole file. Genuine +# exemptions are rare — most "false positives" should be reported +# as gate bugs. +# +# Example: +# +# - rule: A +# file: packages/foo/scripts/legacy-build.mts +# snippet_hash: a1b2c3d4e5f6 +# pattern: "path.join(testDir, 'out', 'Final')" +# reason: | +# legacy-build.mts is scheduled for removal in v2.0; refactoring +# its path construction now would conflict with the rewrite. + +# (No allowlist entries yet — socket-btm is meant to be clean.) diff --git a/.socket-repo-template.json b/.socket-repo-template.json new file mode 100644 index 000000000..0c5a0447b --- /dev/null +++ b/.socket-repo-template.json @@ -0,0 +1,6 @@ +{ + "$schema": "./socket-repo-template-schema.json", + "schemaVersion": 1, + "repoName": "socket-cli", + "kind": "consumer" +} diff --git a/CLAUDE.md b/CLAUDE.md index cf06225ff..78c151511 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -2,132 +2,132 @@ **MANDATORY**: Act as principal-level engineer. Follow these guidelines exactly. -## USER CONTEXT + -- Identify users by git credentials (commit author, GitHub account). Use their actual name, never "the user". -- Use "you/your" when speaking directly; use their name when discussing their work. +## 📚 Fleet Standards -## PARALLEL CLAUDE SESSIONS - WORKTREE REQUIRED +### Identifying users -**This repo may have multiple Claude sessions running concurrently against the same checkout, against parallel git worktrees, or against sibling clones.** Several common git operations are hostile to that and silently destroy or hijack the other session's work. +Identify users by git credentials and use their actual name. Use "you/your" when speaking directly; use names when referencing contributions. -- **FORBIDDEN in the primary checkout** (the one another Claude may be editing): - - `git stash` — shared stash store; another session can `pop` yours. - - `git add -A` / `git add .` — sweeps files belonging to other sessions. - - `git checkout ` / `git switch ` — yanks the working tree out from under another session. - - `git reset --hard` against a non-HEAD ref — discards another session's commits. -- **REQUIRED for branch work**: spawn a worktree instead of switching branches in place. Each worktree has its own HEAD, so branch operations inside it are safe. +### Parallel Claude sessions - ```bash - # From the primary checkout — does NOT touch the working tree here. - git worktree add -b ../- main - cd ../- - # edit, commit, push from here; the primary checkout is untouched. - cd - - git worktree remove ../- - ``` +This repo may have multiple Claude sessions running concurrently against the same checkout, against parallel git worktrees, or against sibling clones. Several common git operations are hostile to that. -- **REQUIRED for staging**: surgical `git add […]` with explicit paths. Never `-A` / `.`. -- **If you need a quick WIP save**: commit on a new branch from inside a worktree, not a stash. +**Forbidden in the primary checkout:** + +- `git stash` — shared store; another session can `pop` yours +- `git add -A` / `git add .` — sweeps files from other sessions +- `git checkout ` / `git switch ` — yanks the working tree out from under another session +- `git reset --hard` against a non-HEAD ref — discards another session's commits + +**Required for branch work:** spawn a worktree. + +```bash +git worktree add -b ../- main +cd ../- +# edit / commit / push from here; primary checkout is untouched +git worktree remove ../- +``` + +**Required for staging:** surgical `git add `. Never `-A` / `.`. + +**Never revert files you didn't touch.** If `git status` shows unfamiliar changes, leave them — they belong to another session, an upstream pull, or a hook side-effect. The umbrella rule: never run a git command that mutates state belonging to a path other than the file you just edited. -## PRE-ACTION PROTOCOL +### Public-surface hygiene + +🚨 The four rules below have hooks that re-print the rule on every public-surface `git` / `gh` command. The rules apply even when the hooks are not installed. -**MANDATORY**: Review CLAUDE.md before any action. No exceptions. +- **Real customer / company names** — never write one into a commit, PR, issue, comment, or release note. Replace with `Acme Inc` or rewrite the sentence to not need the reference. (No enumerated denylist exists — a denylist is itself a leak.) +- **Private repos / internal project names** — never mention. Omit the reference entirely; don't substitute "an internal tool" — the placeholder is a tell. +- **Linear refs** — never put `SOC-123`/`ENG-456`/Linear URLs in code, comments, or PR text. Linear lives in Linear. +- **Publish / release / build-release workflows** — never `gh workflow run|dispatch` or `gh api …/dispatches`. Dispatches are irrevocable. The user runs them manually. -- Before ANY structural refactor on a file >300 LOC: remove dead code/unused exports/imports first — commit separately -- Multi-file changes: break into phases (≤5 files each), verify each phase before the next -- Study existing code before building -- Work from raw error data, not theories — if a bug report has no error output, ask for it -- On "yes", "do it", or "go": execute immediately, no plan recap +### Commits & PRs -## VERIFICATION PROTOCOL +- Conventional Commits `(): ` — NO AI attribution. +- **When adding commits to an OPEN PR**, update the PR title and description to match the new scope. Use `gh pr edit --title … --body …`. The reviewer should know what's in the PR without scrolling commits. +- **Replying to Cursor Bugbot** — reply on the inline review-comment thread, not as a detached PR comment: `gh api repos/{owner}/{repo}/pulls/{pr}/comments/{comment_id}/replies -X POST -f body=…`. -1. Run the actual command — execute, don't assume -2. State what you verified, not just "looks good" -3. **FORBIDDEN**: Claiming "Done" when any test output shows failures -4. Run type-check/lint if configured; fix ALL errors before reporting done -5. Re-read every modified file; confirm nothing references removed items +### Programmatic Claude calls -## CONTEXT & EDIT SAFETY +🚨 Workflows / skills / scripts that invoke `claude` CLI or `@anthropic-ai/claude-agent-sdk` MUST set all four lockdown flags: `tools`, `allowedTools`, `disallowedTools`, `permissionMode: 'dontAsk'`. Never `default` mode in headless contexts. Never `bypassPermissions`. See `.claude/skills/programmatic-claude-lockdown/SKILL.md`. -- After 10+ messages: re-read files before editing -- Read files >500 LOC in chunks (offset/limit) -- Before every edit: re-read. After every edit: re-read to confirm -- When renaming: search direct calls, type refs, string literals, dynamic imports, re-exports, tests -- Tool results over 50K chars are silently truncated — narrow scope and re-run if incomplete -- For tasks touching >5 files: use sub-agents with worktree isolation -- Never fix a display/rendering problem by duplicating state +### Tooling -## JUDGMENT PROTOCOL +- **Package manager**: `pnpm`. Run scripts via `pnpm run foo --flag`, never `foo:bar`. After `package.json` edits, `pnpm install`. +- 🚨 NEVER use `npx`, `pnpm dlx`, or `yarn dlx` — use `pnpm exec ` or `pnpm run