docs(res-to-affine): corpus run + regex precision fixes (Refs #57)#319
Merged
Conversation
First end-to-end run of the Phase-1 scanner against the estate's 491
deduplicated .res files (idaptik + gitbot-fleet) surfaced two
high-impact false-positive sources in the top-level regexes:
- `side-effect-import` fired on in-function `let _ = X.foo(...)`,
ReScript's normal "discard a chained call's return value" idiom
(1,181 hits, vast majority indented; LESSONS.md's anti-pattern is
the top-level form only).
- `mutable-global` fired on any `:=`, including local ref mutation
inside function bodies (653 hits, none at column 0 in the corpus).
Both regexes are now anchored at column 0, matching the LESSONS.md
anti-pattern shape (module-load side effect, module-scoped mutable).
`re_untyped_exn` also gains a `(^|...)` alternation so `raise` / `try`
at column 0 is no longer missed.
Total markers across the corpus: 2,146 → 348 (−84%).
Files with any marker: 216 → 94. The 3-test snapshot suite is
unchanged — the synthetic fixture covers column-0 cases only.
Adds `tools/res-to-affine/CORPUS-RUN.md` (human report with corpus
stats, regex change-log, top-hotspots table, and Phase-2 follow-ups)
plus `CORPUS-RUN.json` (machine-readable sidecar). README links both.
The trade-off: we no longer flag module-load side effects nested
inside `module X = { ... }` blocks, nor `let x = ref(...)` top-level
declarations paired with their later `:=`. Those are scoped to
Phase 2's AST walker and listed in CORPUS-RUN.md as follow-ups.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 21, 2026
hyperpolymath
added a commit
that referenced
this pull request
May 21, 2026
…hase 2b, Refs #57) (#322) * feat(res-to-affine): land tree-sitter grammar build pipeline (Phase 2a, Refs #57) Phase 2 of the `.res → .affine` migration assistant (#57) replaces the Phase-1 line-regex scanner with a tree-sitter AST walker. The grammar itself is manifest-vendored in `editors/tree-sitter-rescript/` (since #314), but the actual build pipeline that turns the manifest into a loadable parser had not been wired up. This commit is that wiring. - `justfile` gains an `install-grammar` recipe that wraps the existing `editors/tree-sitter-rescript/scripts/install.sh` so the bootstrap step is a discoverable `just` recipe alongside `build`/`test`/etc. - `editors/tree-sitter-rescript/scripts/install.sh` updates its error message when the `tree-sitter` CLI is missing to point at both the Rust-native install (`cargo install tree-sitter-cli`, the repo's preferred path for CLI tooling per CLAUDE.md) and the existing Node-based one (`npm install -g tree-sitter-cli`). The script behaviour is unchanged when the CLI is present. - `.github/workflows/ci.yml` gains a `migration-assistant` job that installs the tree-sitter CLI via npm (the fast CI path — a pre-built binary, ~5 s vs. ~5 min for `cargo install` from source), runs `just install-grammar`, verifies `tools/vendor/tree-sitter-rescript/ src/parser.c` was produced, and smoke-parses the existing `tools/res-to-affine/test/fixtures/sample.res` fixture. The job sits alongside `vscode-smoke` as a Node-using carve-out under the same reasoning (manifest dep on an npm-distributed tool whose binary output is what we actually consume; no new TS source in this repo). - `editors/tree-sitter-rescript/README.md` documents the dual install path, the new justfile recipe, and the CI gate. Phase 2b (the actual walker — `walker.ml`, AST-based detection of `side-effect-import`, `--engine` CLI flag, snapshot tests vs. the regex scanner) stacks on top of this branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(res-to-affine): tree-sitter AST walker for side-effect-import (Phase 2b, Refs #57) Lands the first AST-based detection pass on top of the Phase-2a grammar build pipeline (#321). New module `tools/res-to-affine/ walker.ml` shells out to the vendored `tree-sitter` CLI, parses its default s-expression output (with embedded `[row, col]` byte ranges), and walks the AST to find the `side-effect-import` anti-pattern structurally rather than via the column-0-anchored regex the Phase-1 scanner uses. Detection upgrade vs. Phase 1 ----------------------------- The walker flags `let _ = Mod.value` only when the binding sits at module top level — direct child of `source_file`, or direct child of a `block` that is the body of a `module_declaration`. The Phase-1 regex matches any column-0 occurrence of the same shape. On the existing `sample.res` fixture the walker reports 1 finding (the intended top-level `let _ = Pixi.Sound.register` on line 8); the scanner reports the same 1 plus would also have reported the same pattern nested inside a function body. The regex pattern that #319 band-aided with column-0 anchoring is eliminated structurally by the walker. CLI --- New `--engine={scanner,walker}` flag on the CLI (default: scanner, preserving Phase-1 behaviour); `--grammar-dir DIR` overrides the walker's parser location (default: `tools/vendor/tree-sitter-rescript`, matching `install.sh` output). When the walker hits any error (grammar missing, tree-sitter not on PATH, s-exp parse failure), it prints a one-line reason to stderr and falls back to the scanner — the CLI never bails because of walker problems. Tests ----- `test/test_walker.ml` adds two end-to-end tests under a new `res-to-affine-walker` alcotest run. Both auto-skip if the `tree-sitter` CLI is missing or the vendored grammar has not been built, so `dune runtest` on a fresh clone (no bootstrap) still passes. The repo-root discovery walks up from cwd looking for `dune-project` to find the source-tree grammar path; the dune sandbox otherwise hides it. CI -- The existing `build` job now installs `tree-sitter-cli` (npm, fast) and runs `install.sh` before `dune runtest`, so the walker tests actually execute rather than auto-skip. The migration-assistant gate added in #321 stays — it remains the dedicated first-signal job for grammar-pin drift. Scope discipline ---------------- `raw-js`, `untyped-exception`, `mutable-global` remain scanner-only; their AST counterparts land in Phase 2c. The walker exposes a stable surface (`Walker.scan : grammar_dir:string -> path:string -> source:string -> Scanner.finding list`) that 2c extends rather than re-architects. Stack: #321 (Phase 2a) → this PR (Phase 2b) → Phase 2c. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First end-to-end run of the Phase-1 scanner (#314) against the estate's
real
.ressurface — 491 deduplicated files acrossidaptik(475) andgitbot-fleet(16). The run surfaced two high-impact false-positivesources, which this PR fixes alongside the corpus report.
side-effect-importwas flagging in-functionlet _ = X.f(...)— ReScript's normal "discard chained-call return value" idiom — not
just the LESSONS.md "module-load side effect" anti-pattern. 1,181
hits in the corpus, vast majority indented.
mutable-globalwas flagging any:=, includingintra-function local-ref mutation. 653 hits in the corpus, none at
column 0.
Both regexes now anchor at column 0.
re_untyped_exnalso picks upcolumn-0
raise/tryvia a(^|...)alternation.Impact
side-effect-importmutable-globalraw-jsuntyped-exceptionSpot-checked the new top hotspots — every remaining
side-effect-importhit is genuinely column-0 (e.g.
idaptik/src/Main.res:5:let _ = PixiSound.sound);every
raw-jsanduntyped-exceptionhit corresponds to a real%raw(…)block or
try/Js.Exn/Promise.catchoccurrence.What's added
tools/res-to-affine/CORPUS-RUN.md— human report (corpus stats,regex change-log, top hot-spots, Phase-2 follow-ups, reproducer).
tools/res-to-affine/CORPUS-RUN.json— machine-readable sidecar.tools/res-to-affine/README.md— link to both.tools/res-to-affine/scanner.ml— three regex tightenings withan inline comment naming the column-0 vs in-function trade-off.
Trade-off
We no longer flag:
module X = { ... }blockslet x = ref(...)declarations (we still flag the:=that follows, if column-0)
Both are scoped to Phase 2's AST walker and listed in
CORPUS-RUN.mdas follow-ups (item 2 + item 3).Refs #57 (the issue is multi-phase, this is a precision pass on
Phase 1, not a phase close).
Test plan
dune build tools/res-to-affine— clean.dune test tools/res-to-affine/— 3/3 OK (snapshot byte-identicalbecause the synthetic fixture covers column-0 cases only).
tally matches the table above and spot-checked remaining hits.
🤖 Generated with Claude Code