Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 127 additions & 0 deletions docs/MIGRATION-ASSISTANT.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
// SPDX-License-Identifier: MPL-2.0
// SPDX-FileCopyrightText: 2026 Jonathan D.A. Jewell
= ReScript → AffineScript migration assistant — architectural decision
:toc: macro
:toclevels: 2

Tracks issue #57 (parser + metaparser). Companion to
link:RESCRIPT-ELIMINATION.adoc[RESCRIPT-ELIMINATION.adoc], which is the
authoritative ledger for the broader estate ReScript-surface retirement.

toc::[]

== Context

Estate language policy retires ReScript in favour of AffineScript →
typed-wasm. Per the inventories captured in
link:RESCRIPT-ELIMINATION.adoc[RESCRIPT-ELIMINATION.adoc] and the
upstream tracker `hyperpolymath/gitbot-fleet#148`, ~5k LOC of ReScript
remains in `gitbot-fleet/bots/sustainabot/bot-integration/src/` alone;
the idaptik tail is ~542 `.res` files plus ~80 `.ts`. By-hand
translation is impractical, *and* a literal transliterator misses the
point — the AffineScript answer to ReScript's anti-patterns is
*re-decomposition*, not a token-level rewrite.

Issue #57 proposes a *migration assistant*: a tool that reads `.res`,
recognises the anti-patterns surfaced by the idaptik Wave 3 pilot, and
emits a `.affine` *skeleton* that surfaces the work the human migrator
still owes.

== Decision

* The migration assistant lives at `tools/res-to-affine/` as an OCaml
CLI built by the repo's existing `dune` toolchain.
* The canonical source-of-truth grammar for `.res` parsing is
https://github.com/rescript-lang/tree-sitter-rescript[`rescript-lang/tree-sitter-rescript`],
vendored manifest-only at `editors/tree-sitter-rescript/` (pinned to
commit `990214a83f25801dfe0226bd7e92bb71bba1970f`, version 6.0.0,
MIT-licensed and compatible with this repo's MPL-2.0).
* The tool ships in three phases:
+
[cols="1,3,2"]
|===
| Phase | What it does | Status

| 1
| Text-scan emitter detecting 4 of the 6 anti-patterns, emitting a
`.affine` skeleton with migration markers and the quoted original
for reference.
| this PR

| 2
| Replaces the text scanner with a tree-sitter AST walker reading the
vendored grammar. Adds the two deferred patterns. Same emitter
interface.
| follow-up

| 3
| Partial *translation* of pure-structural forms (type aliases, sum
decls, simple `let` bindings, `switch` → `match`). Effect-laden /
exception-bearing / globally-mutating regions remain TODO islands.
| follow-up
|===
* The Phase-1 deliverable is **deliberately small and useful in
isolation**. It gates the architectural commitment to tree-sitter
behind something that already pays its way against real estate
`.res` files.

== Alternatives considered

=== Use the ReScript compiler's own AST (`bs-tools` / `rescript ast`)

The richest signal is in the ReScript compiler's typed AST. Rejected
because:

* Adding `rescript` as a build-time dependency contradicts the estate
language policy (which bans new ReScript code and treats ReScript as
the artefact to be retired).
* The ReScript compiler's AST changes across versions in
non-backwards-compatible ways; pinning would create an ongoing
compatibility burden in the wrong direction.

=== Write a hand-rolled `.res` lexer/parser in OCaml

We already have `lib/rescript_codegen.ml` going *affinescript → .res*,
so the grammar is partly understood. Rejected because:

* ReScript's surface syntax is large; recreating it for a one-way
migration tool is days-to-weeks of work that the canonical
tree-sitter grammar has already done and maintains.
* The community grammar is MIT-licensed and version-pinned; the cost
of consuming it is a one-line manifest plus an install script.

=== Pattern-detector only (no AST in any phase)

Phase 1 *is* this — but committing to it permanently would leave the
two structural anti-patterns (callback records, oversized functions)
undetected forever, and would block Phase 3 (partial translation),
which is what makes the tool earn its keep on idaptik's 542 files.

== Consequences

* `editors/tree-sitter-rescript/` exists for the migration pipeline,
not as an editor binding. The editor binding for AffineScript itself
remains `editors/tree-sitter-affinescript/`.
* `tools/res-to-affine/` is the first OCaml tool under `tools/`
(existing tools are shell scripts or Rust). The `dune` integration
is local to the tool's own `dune` file; no workspace changes.
* Phase 2 introduces `tree-sitter` CLI as a runtime dependency for the
migration assistant. It is *not* a build-time dependency for the
AffineScript compiler itself. CI for the migration tool's Phase-2
tests will need to install `tree-sitter-cli`.
* The Phase plan is recorded in
link:../tools/res-to-affine/README.md[`tools/res-to-affine/README.md`];
this document is the architectural decision, the README is the
user/contributor surface.

== References

* `tools/res-to-affine/README.md` — tool usage, Phase plan, design rationale.
* `editors/tree-sitter-rescript/README.md` — vendoring manifest details.
* `affinescript#57` — parser + metaparser proposal.
* `gitbot-fleet#148` — downstream tracker for the consumed ReScript subtree.
* link:RESCRIPT-ELIMINATION.adoc[`RESCRIPT-ELIMINATION.adoc`] — estate-wide ledger.
* https://github.com/hyperpolymath/idaptik/blob/main/migration/main/LESSONS.md[idaptik LESSONS.md]
— six anti-patterns the assistant targets.
* https://github.com/hyperpolymath/idaptik/blob/main/migration/main/PILOT.md[idaptik PILOT.md]
— original Wave-3 pilot that surfaced the six patterns.
44 changes: 44 additions & 0 deletions editors/tree-sitter-rescript/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
<!-- SPDX-License-Identifier: MPL-2.0 -->
<!-- SPDX-FileCopyrightText: 2026 Jonathan D.A. Jewell -->

# tree-sitter-rescript (vendoring manifest)

This directory is a **manifest-only vendoring** of the canonical
[`rescript-lang/tree-sitter-rescript`][upstream] grammar. The grammar
itself is not copied into this repository — `package.json` declares it
as a dependency, and `scripts/install.sh` fetches and builds it at the
pinned commit.

The grammar is consumed by `tools/res-to-affine/`, the `.res → .affine`
migration assistant (`affinescript#57`). It is **not** an editor binding
for AffineScript; for that, see `editors/tree-sitter-affinescript/`.

## Pinned upstream

- **Repository:** <https://github.com/rescript-lang/tree-sitter-rescript>
- **Commit:** `990214a83f25801dfe0226bd7e92bb71bba1970f`
- **Version:** 6.0.0
- **License:** MIT (preserved upstream; compatible with this repo's MPL-2.0)

When updating the pin, regenerate `tools/res-to-affine/test/expected/`
snapshots, since AST shapes may shift.

## Install

```sh
./scripts/install.sh
```

This writes a `tree-sitter-rescript` directory under `tools/vendor/`
(gitignored — same convention as the WASI adapter pinning), containing
the generated parser. Requires `git` and `tree-sitter` CLI on PATH.

## Why manifest, not copy

The upstream grammar is ~10k lines of JS plus generated C. Copying it
into this MPL-2.0 repo would (a) bloat the tree, (b) create an ongoing
sync burden, and (c) duplicate MIT-licensed code we have no business
modifying. The manifest+install approach keeps the dependency explicit
and pinned without absorbing the source.

[upstream]: https://github.com/rescript-lang/tree-sitter-rescript
16 changes: 16 additions & 0 deletions editors/tree-sitter-rescript/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"name": "@affinescript/tree-sitter-rescript-vendoring",
"version": "0.1.0",
"private": true,
"description": "Manifest-only vendoring of rescript-lang/tree-sitter-rescript for the .res -> .affine migration assistant (affinescript#57).",
"license": "MPL-2.0",
"dependencies": {
"tree-sitter-rescript": "github:rescript-lang/tree-sitter-rescript#990214a83f25801dfe0226bd7e92bb71bba1970f"
},
"devDependencies": {
"tree-sitter-cli": "^0.25.0"
},
"scripts": {
"install-grammar": "./scripts/install.sh"
}
}
41 changes: 41 additions & 0 deletions editors/tree-sitter-rescript/scripts/install.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#!/usr/bin/env bash
# SPDX-License-Identifier: MPL-2.0
# SPDX-FileCopyrightText: 2026 Jonathan D.A. Jewell
#
# Fetch and build the pinned tree-sitter-rescript grammar.
# Output goes under ../../.build/tree-sitter-rescript/ (gitignored).

set -euo pipefail

UPSTREAM_COMMIT="990214a83f25801dfe0226bd7e92bb71bba1970f"
REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../../.." && pwd)"
# tools/vendor/ is the repo's convention for fetched-not-committed deps
# (see .gitignore line 103, mirrors the WASI adapter provisioning).
BUILD_DIR="${REPO_ROOT}/tools/vendor/tree-sitter-rescript"

if ! command -v tree-sitter >/dev/null 2>&1; then
echo "error: tree-sitter CLI not found on PATH" >&2
echo " install via: npm install -g tree-sitter-cli" >&2
exit 2
fi

if ! command -v git >/dev/null 2>&1; then
echo "error: git not found on PATH" >&2
exit 2
fi

mkdir -p "$(dirname "$BUILD_DIR")"

if [ -d "$BUILD_DIR/.git" ]; then
git -C "$BUILD_DIR" fetch --quiet origin "$UPSTREAM_COMMIT" || true
git -C "$BUILD_DIR" checkout --quiet "$UPSTREAM_COMMIT"
else
rm -rf "$BUILD_DIR"
git clone --quiet https://github.com/rescript-lang/tree-sitter-rescript.git "$BUILD_DIR"
git -C "$BUILD_DIR" checkout --quiet "$UPSTREAM_COMMIT"
fi

cd "$BUILD_DIR"
tree-sitter generate

echo "tree-sitter-rescript built at ${BUILD_DIR} (commit ${UPSTREAM_COMMIT})"
138 changes: 138 additions & 0 deletions tools/res-to-affine/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
<!-- SPDX-License-Identifier: MPL-2.0 -->
<!-- SPDX-FileCopyrightText: 2026 Jonathan D.A. Jewell -->

# `res-to-affine` — ReScript-to-AffineScript migration assistant

A small OCaml CLI that reads a `.res` file and emits a `.affine` skeleton
with **migration markers** — comments that name each anti-pattern the
scanner found, point at the source line, and propose the AffineScript
answer the human migrator should consider before porting.

Tracks: [`affinescript#57`](https://github.com/hyperpolymath/affinescript/issues/57)
(parser + metaparser).
Consumed by: [`hyperpolymath/gitbot-fleet#148`](https://github.com/hyperpolymath/gitbot-fleet/issues/148)
and the broader `idaptik` migration.

## Usage

```sh
# print skeleton to stdout
dune exec tools/res-to-affine/main.exe -- path/to/Foo.res

# or write to a file
dune exec tools/res-to-affine/main.exe -- path/to/Foo.res -o Foo.affine
```

The output is **not compilable**. It is a starting point for the human:
a quoted copy of the original sits at the bottom; the top carries a
migration-considerations block; the middle is a `module` stub with
`TODO`s. The human picks the decomposition; the tool surfaces what
needs re-decomposing.

## What gets flagged (Phase 1)

The six anti-patterns surfaced in the
[idaptik Wave 3 pilot](https://github.com/hyperpolymath/idaptik/blob/main/migration/main/LESSONS.md),
of which the line-based scanner reliably detects four:

| Tag | Detection | AffineScript answer |
|---|---|---|
| `side-effect-import` | `let _ = Mod.foo` at top level | Explicit registration call |
| `raw-js` | `%raw(...)` or `[%bs.raw ...]` | Typed extern (`ABI-FFI-README.md`) |
| `untyped-exception` | `Promise.catch`, `Js.Exn`, `raise`, `try` | `Result[E, A]` / `Validation[E, A]` |
| `mutable-global` | `:=` operator | Affine record threaded through |

Deferred to Phase 2 (need real AST):

- **inline lambda callback record** — N ≥ 3 `~handler: (...) =>` lambdas
inside one record literal (collapse to a row-polymorphic record).
- **oversized function** — function body > ~50 LOC (decompose).

## Why a skeleton and not a transliteration

The Frontier Programming Guides' standing rule is **re-decompose, not
transliterate**. A line-for-line port preserves the source's anti-patterns
into the target language and produces `.affine` files that are technically
parseable but architecturally still ReScript. The migration assistant's
job is to *make the re-decomposition tractable*, not to skip it. So:

- The skeleton is **honest about being incomplete** — it does not
compile, on purpose.
- The original source is **quoted at the bottom** so the migrator
doesn't tab between files while writing the port.
- Each marker links a source line to the AffineScript pattern that
replaces it, so the migrator's next action is clear.

## Phase plan

### Phase 1 — text-scan emitter (this PR)

- OCaml binary builds with the repo's existing `dune` toolchain.
- `Scanner` walks lines with `str` regexes; cheap and dependency-free.
- `Emitter` writes the migration-considerations block, a `module` stub,
and the quoted source.
- Snapshot tests under `test/` ensure stable output.

This phase is **deliberately small**. It is useful immediately — runs
against any `.res` file, surfaces 4 of 6 anti-patterns, gives the
migrator a starting document — and it gates the architectural commitment
to tree-sitter in Phase 2 behind something that already pays its way.

### Phase 2 — tree-sitter AST walker

- Install the pinned grammar from
`editors/tree-sitter-rescript/` (manifest-only vendoring of
`rescript-lang/tree-sitter-rescript@990214a`).
- Replace `Scanner` with a walker over the s-expression output of
`tree-sitter parse --quiet`, parsed by the existing `sexplib0`
dependency.
- Adds the two deferred patterns (callback records, oversized
functions) and unlocks **structural** translation of trivial forms
(e.g. `option<X>` → `Option[X]`, `result<X, Y>` → `Result[Y, X]`,
`switch x { | A => ... }` → `match x { A => ... }`).
- The `Emitter` interface does not change: same skeleton shape, same
marker schema, richer body.

### Phase 3 — partial translation

Once the AST walker exists, the emitter can do more than mark — it can
**translate** the pure-structural parts (type aliases, sum decls,
simple `let` bindings, switch-to-match) and leave only effect-laden,
exception-bearing, or globally-mutating regions as TODO. The skeleton
becomes a working port of ~60–80% of the input, with TODO islands
where re-decomposition is genuinely required.

Phase 3 is when the tool earns its keep on idaptik's 542 files.

## Testing

```sh
dune test tools/res-to-affine/
```

To regenerate snapshots after an intentional emitter change:

```sh
cd tools/res-to-affine/test
../../../_build/default/tools/res-to-affine/main.exe \
fixtures/sample.res > expected/sample.affine
```

The fixture under `test/fixtures/sample.res` is synthetic and exercises
every Phase-1 anti-pattern. Real `.res` files from the estate (e.g.
`gitbot-fleet/bots/sustainabot/bot-integration/src/*.res`) can be run
ad hoc through the CLI without changes to the test suite.

## Non-goals

- **Not a ReScript compiler.** The scanner does not parse ReScript;
even Phase 2 only walks the tree-sitter CST, not the ReScript
type-checker's AST. If a `.res` file is syntactically invalid the
tool may still emit a (less useful) skeleton.
- **Not a build-time dependency on ReScript.** The pinned grammar is a
parser, not the ReScript compiler. The estate's language policy
(CLAUDE.md) bans new ReScript code; this tool exists to **help retire
the existing ReScript surface**, not to bring more in.
- **Not for editor integration.** Editor tree-sitter bindings for
AffineScript live at `editors/tree-sitter-affinescript/`; this tool's
vendored grammar is for the migration pipeline only.
13 changes: 13 additions & 0 deletions tools/res-to-affine/dune
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
; SPDX-License-Identifier: MPL-2.0

(executable
(name main)
(modules main)
(public_name res-to-affine)
(package affinescript)
(libraries res_to_affine cmdliner fmt fmt.tty))

(library
(name res_to_affine)
(modules scanner emitter)
(libraries str))
Loading