Skip to content

docs(res-to-affine): corpus run + regex precision fixes (Refs #57)#319

Merged
hyperpolymath merged 1 commit into
mainfrom
corpus-run-57
May 21, 2026
Merged

docs(res-to-affine): corpus run + regex precision fixes (Refs #57)#319
hyperpolymath merged 1 commit into
mainfrom
corpus-run-57

Conversation

@hyperpolymath
Copy link
Copy Markdown
Owner

Summary

First end-to-end run of the Phase-1 scanner (#314) against the estate's
real .res surface — 491 deduplicated files across idaptik (475) and
gitbot-fleet (16). The run surfaced two high-impact false-positive
sources, which this PR fixes alongside the corpus report.

  • side-effect-import was flagging in-function let _ = X.f(...)
    — ReScript's normal "discard chained-call return value" idiom — not
    just the LESSONS.md "module-load side effect" anti-pattern. 1,181
    hits in the corpus, vast majority indented.
  • mutable-global was flagging any :=, including
    intra-function local-ref mutation. 653 hits in the corpus, none at
    column 0.

Both regexes now anchor at column 0. re_untyped_exn also picks up
column-0 raise / try via a (^|...) alternation.

Impact

Before After Δ
Total markers 2,146 348 −84%
Files with markers 216 94 −122
side-effect-import 1,181 36 −1,145
mutable-global 653 0 −653
raw-js 198 198 unchanged
untyped-exception 114 114 unchanged

Spot-checked the new top hotspots — every remaining side-effect-import
hit is genuinely column-0 (e.g. idaptik/src/Main.res:5:let _ = PixiSound.sound);
every raw-js and untyped-exception hit corresponds to a real %raw(…)
block or try/Js.Exn/Promise.catch occurrence.

What's added

  • tools/res-to-affine/CORPUS-RUN.md — human report (corpus stats,
    regex change-log, top hot-spots, Phase-2 follow-ups, reproducer).
  • tools/res-to-affine/CORPUS-RUN.json — machine-readable sidecar.
  • tools/res-to-affine/README.md — link to both.
  • tools/res-to-affine/scanner.ml — three regex tightenings with
    an inline comment naming the column-0 vs in-function trade-off.

Trade-off

We no longer flag:

  • module-load side effects nested inside module X = { ... } blocks
  • top-level let x = ref(...) declarations (we still flag the :=
    that follows, if column-0)

Both are scoped to Phase 2's AST walker and listed in
CORPUS-RUN.md as follow-ups (item 2 + item 3).

Refs #57 (the issue is multi-phase, this is a precision pass on
Phase 1, not a phase close).

Test plan

  • dune build tools/res-to-affine — clean.
  • dune test tools/res-to-affine/ — 3/3 OK (snapshot byte-identical
    because the synthetic fixture covers column-0 cases only).
  • Re-ran scanner over the 491-file corpus, verified the per-kind
    tally matches the table above and spot-checked remaining hits.
  • CI on this PR.

🤖 Generated with Claude Code

First end-to-end run of the Phase-1 scanner against the estate's 491
deduplicated .res files (idaptik + gitbot-fleet) surfaced two
high-impact false-positive sources in the top-level regexes:

- `side-effect-import` fired on in-function `let _ = X.foo(...)`,
  ReScript's normal "discard a chained call's return value" idiom
  (1,181 hits, vast majority indented; LESSONS.md's anti-pattern is
  the top-level form only).
- `mutable-global` fired on any `:=`, including local ref mutation
  inside function bodies (653 hits, none at column 0 in the corpus).

Both regexes are now anchored at column 0, matching the LESSONS.md
anti-pattern shape (module-load side effect, module-scoped mutable).
`re_untyped_exn` also gains a `(^|...)` alternation so `raise` / `try`
at column 0 is no longer missed.

Total markers across the corpus: 2,146 → 348 (−84%).
Files with any marker: 216 → 94. The 3-test snapshot suite is
unchanged — the synthetic fixture covers column-0 cases only.

Adds `tools/res-to-affine/CORPUS-RUN.md` (human report with corpus
stats, regex change-log, top-hotspots table, and Phase-2 follow-ups)
plus `CORPUS-RUN.json` (machine-readable sidecar). README links both.

The trade-off: we no longer flag module-load side effects nested
inside `module X = { ... }` blocks, nor `let x = ref(...)` top-level
declarations paired with their later `:=`. Those are scoped to
Phase 2's AST walker and listed in CORPUS-RUN.md as follow-ups.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@hyperpolymath hyperpolymath merged commit 65fa5ef into main May 21, 2026
0 of 14 checks passed
@hyperpolymath hyperpolymath deleted the corpus-run-57 branch May 21, 2026 00:23
hyperpolymath added a commit that referenced this pull request May 21, 2026
…hase 2b, Refs #57) (#322)

* feat(res-to-affine): land tree-sitter grammar build pipeline (Phase 2a, Refs #57)

Phase 2 of the `.res → .affine` migration assistant (#57) replaces the
Phase-1 line-regex scanner with a tree-sitter AST walker. The grammar
itself is manifest-vendored in `editors/tree-sitter-rescript/` (since
#314), but the actual build pipeline that turns the manifest into a
loadable parser had not been wired up. This commit is that wiring.

- `justfile` gains an `install-grammar` recipe that wraps the existing
  `editors/tree-sitter-rescript/scripts/install.sh` so the bootstrap
  step is a discoverable `just` recipe alongside `build`/`test`/etc.
- `editors/tree-sitter-rescript/scripts/install.sh` updates its error
  message when the `tree-sitter` CLI is missing to point at both the
  Rust-native install (`cargo install tree-sitter-cli`, the repo's
  preferred path for CLI tooling per CLAUDE.md) and the existing
  Node-based one (`npm install -g tree-sitter-cli`). The script
  behaviour is unchanged when the CLI is present.
- `.github/workflows/ci.yml` gains a `migration-assistant` job that
  installs the tree-sitter CLI via npm (the fast CI path — a pre-built
  binary, ~5 s vs. ~5 min for `cargo install` from source), runs
  `just install-grammar`, verifies `tools/vendor/tree-sitter-rescript/
  src/parser.c` was produced, and smoke-parses the existing
  `tools/res-to-affine/test/fixtures/sample.res` fixture. The job sits
  alongside `vscode-smoke` as a Node-using carve-out under the same
  reasoning (manifest dep on an npm-distributed tool whose binary
  output is what we actually consume; no new TS source in this repo).
- `editors/tree-sitter-rescript/README.md` documents the dual install
  path, the new justfile recipe, and the CI gate.

Phase 2b (the actual walker — `walker.ml`, AST-based detection of
`side-effect-import`, `--engine` CLI flag, snapshot tests vs. the
regex scanner) stacks on top of this branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(res-to-affine): tree-sitter AST walker for side-effect-import (Phase 2b, Refs #57)

Lands the first AST-based detection pass on top of the Phase-2a
grammar build pipeline (#321). New module `tools/res-to-affine/
walker.ml` shells out to the vendored `tree-sitter` CLI, parses its
default s-expression output (with embedded `[row, col]` byte ranges),
and walks the AST to find the `side-effect-import` anti-pattern
structurally rather than via the column-0-anchored regex the Phase-1
scanner uses.

Detection upgrade vs. Phase 1
-----------------------------

The walker flags `let _ = Mod.value` only when the binding sits at
module top level — direct child of `source_file`, or direct child of
a `block` that is the body of a `module_declaration`. The Phase-1
regex matches any column-0 occurrence of the same shape. On the
existing `sample.res` fixture the walker reports 1 finding (the
intended top-level `let _ = Pixi.Sound.register` on line 8); the
scanner reports the same 1 plus would also have reported the same
pattern nested inside a function body. The regex pattern that #319
band-aided with column-0 anchoring is eliminated structurally by
the walker.

CLI
---

New `--engine={scanner,walker}` flag on the CLI (default: scanner,
preserving Phase-1 behaviour); `--grammar-dir DIR` overrides the
walker's parser location (default: `tools/vendor/tree-sitter-rescript`,
matching `install.sh` output). When the walker hits any error
(grammar missing, tree-sitter not on PATH, s-exp parse failure), it
prints a one-line reason to stderr and falls back to the scanner —
the CLI never bails because of walker problems.

Tests
-----

`test/test_walker.ml` adds two end-to-end tests under a new
`res-to-affine-walker` alcotest run. Both auto-skip if the
`tree-sitter` CLI is missing or the vendored grammar has not been
built, so `dune runtest` on a fresh clone (no bootstrap) still
passes. The repo-root discovery walks up from cwd looking for
`dune-project` to find the source-tree grammar path; the dune sandbox
otherwise hides it.

CI
--

The existing `build` job now installs `tree-sitter-cli` (npm, fast)
and runs `install.sh` before `dune runtest`, so the walker tests
actually execute rather than auto-skip. The migration-assistant
gate added in #321 stays — it remains the dedicated first-signal
job for grammar-pin drift.

Scope discipline
----------------

`raw-js`, `untyped-exception`, `mutable-global` remain
scanner-only; their AST counterparts land in Phase 2c. The walker
exposes a stable surface (`Walker.scan : grammar_dir:string ->
path:string -> source:string -> Scanner.finding list`) that 2c
extends rather than re-architects.

Stack: #321 (Phase 2a) → this PR (Phase 2b) → Phase 2c.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant