Skip to content

feat(types): migrate builder stages, search, and graph domain to TypeScript (Phase 5.4)#579

Open
carlos-alm wants to merge 38 commits intomainfrom
feat/ts-migrate-phase5.5
Open

feat(types): migrate builder stages, search, and graph domain to TypeScript (Phase 5.4)#579
carlos-alm wants to merge 38 commits intomainfrom
feat/ts-migrate-phase5.5

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • Migrate 24 JS files to strict TypeScript across builder stages, search subsystem, and graph domain utilities
  • Delete all 24 stale .js counterparts — no dual-file coexistence
  • Update ROADMAP.md Phase 5.4 progress (39 remaining → 15 remaining)

Files migrated

Domain Count Files
domain/graph/builder/stages/ 9 collect-files, parse-files, resolve-imports, build-edges, insert-nodes, build-structure, run-analyses, detect-changes, finalize
domain/graph/builder/ 1 incremental
domain/graph/ 3 cycles, journal, change-journal
domain/search/search/ 6 hybrid, semantic, keyword, filters, prepare, cli-formatter
domain/search/stores/ 2 fts5, sqlite-blob
domain/search/strategies/ 3 source, structured, text-utils

Key type fixes

  • ParseChange.stat type alignment: convert mtimeMsmtime at the detect-changes boundary so downstream consumers (insert-nodes) get the expected shape
  • IncrementalStmts duck-typed interfaces for watcher compatibility (watcher wraps statements with custom objects)
  • NodeWithId relaxed to Pick<NodeRow, ...> for partial node queries in embedding generator
  • FTS5 ESCAPE clause backslash fix (double-escaped '\\''\')
  • BuildGraphOpts extended with scope and skipRegistry fields used at runtime

Test plan

  • npx tsc --noEmit passes with zero errors
  • Full test suite: 2032 passed, 0 regressions (3 pre-existing failures: 2 Windows EPERM fixture copies, 1 native/WASM parity)
  • Incremental build parity tests pass
  • Search subsystem tests pass (embedder, keyword, hybrid, semantic)

…housekeep

Four recurring maintenance routines as Claude Code skills:
- /deps-audit: vulnerability scanning, staleness, unused deps, license checks
- /bench-check: benchmark regression detection against saved baselines
- /test-health: flaky test detection, dead tests, coverage gap analysis
- /housekeep: clean worktrees, dirt files, sync main, prune branches
…line

- Replace 2>/dev/null with output=$(... 2>&1) + exit_code check on all
  four benchmark invocations so error messages are captured and recorded
- Add division-by-zero guard in Phase 3: when baseline == 0, mark delta
  as "N/A — baseline was zero" (informational only, not a regression)
- Add git add + git commit step in Phase 5 so the baseline file is
  actually committed after each save, matching the documented rule
- After reverting package.json + package-lock.json on --fix test failure,
  also run `npm ci` to resync node_modules/ with the restored lock file;
  without this the manifest is reverted but installed packages are not
- Add explanatory comment on @anthropic-ai/tokenizer skip-list entry
  clarifying it is a peer dependency of @anthropic-ai/sdk and may be
  required at runtime without an explicit import in our code
…erion

- Phase 5 (Update Codegraph): add source-repo guard that skips the
  self-update logic when running inside the codegraph source repo;
  comparing the dev version to the published release and running
  npm install is a no-op since codegraph is not one of its own deps
- Phase 1b stale-worktree criterion: replace "created more than 7 days
  ago" (not determinable via git worktree list) with "last commit on the
  branch is more than 7 days old AND branch has no commits ahead of
  origin/main", using `git log -1 --format=%ci <branch>`
When Phase 0 stash push is a no-op (manifests unchanged), Phase 7
was calling stash drop/pop on the wrong entry. Track STASH_CREATED
exit code and branch on it: use git checkout when no stash exists.
…ent corruption

Replace hardcoded /tmp/test-health-runs/ with mktemp -d so parallel
sessions get isolated directories. Add cleanup at end of analysis.
…plicit commit paths

Add 4th verdict path for --save-baseline when baseline already exists.
Replace corrupted em-dash character in N/A string. Change commit command
to use explicit file paths per project convention.
…ress

Phase 5 was listed as "2 of 7 complete" with outdated pre-Phase 3 file
paths. Updated to reflect actual state: 32 of 269 source modules migrated
(~12%). Steps 5.3-5.5 now list exact migrated/remaining files with verified
counts (5.3=8, 5.4=54, 5.5=175, total=237 JS-only files). Added note about
14 stale .js counterparts of already-migrated .ts files needing deletion.
When STASH_CREATED=1 and tests pass, the npm audit fix changes are
good — no action needed. Previously it ran git checkout to discard
them, which undid the successful fix.
git diff --quiet ignores untracked files, so on the first run when
baseline.json and history.ndjson are newly created, the commit was
skipped. Stage first with git add, then check with --cached.
Branch deletion now asks for user confirmation before each delete,
consistent with worktree removal in Phase 1c.
- bench-check: add timeout 300 wrappers to all 4 benchmark invocations
  with exit code 124 check for timeout detection
- bench-check: add explicit COMPARE_ONLY guard at Phase 5 entry
- housekeep: fix grep portability — use grep -cE instead of GNU \| syntax
- test-health: add timeout 180 wrapper in flaky detection loop
- test-health: fix find command -o precedence with grouping parentheses
When both stat variants (GNU and BSD) fail, $size is empty and the
arithmetic comparison errors out. Add a [ -z "$size" ] && continue
guard so the loop skips files whose size cannot be determined.
…check (#565)

Phase 6: when SAVE_ONLY or first-run (no prior baseline), write a
shortened report with "Verdict: BASELINE SAVED" instead of the full
comparison report.

Phases 1a-1d: replace ambiguous "If timeout / If non-zero" with
explicit "If timeout / Else if non-zero" so the two conditions are
clearly mutually exclusive.
Phase 4 (Resolution Accuracy) had all 6 sub-phases merged but status
still said "In Progress". Phase 5 (TypeScript Migration) had 5.3-5.5
merged via PRs #553, #554, #555, #566 but was listed with stale counts.
Updated both to reflect actual state: Phase 4 complete, Phase 5 at 5/7
with 76 of 283 modules migrated (~27%).
Previous commit incorrectly marked 5.3-5.5 as complete. In reality
76 of 283 src files are .ts (~27%) while 207 remain .js (~73%).
PRs #553, #554, #555, #566 migrated a first wave but left substantial
work in each step: 4 leaf files, 39 core files, 159 orchestration
files. Updated each step with accurate migrated/remaining counts.
The /review skill allowed replying "acknowledged as follow-up" to
reviewer comments without tracking them anywhere. These deferrals
get lost — nobody revisits PR comment threads after merge.

Now: if a fix is genuinely out of scope, the skill must create a
GitHub issue with the follow-up label before replying. The reply
must include the issue link. A matching rule in the Rules section
reinforces the ban.
…s to TypeScript (Phase 5.5)

Migrate 19 remaining JS files to TypeScript across db/, graph/, and domain/:
- db/: connection, migrations, query-builder, index barrel
- graph/algorithms/leiden/: adapter, cpm, modularity, optimiser, partition, index
- graph/algorithms/: louvain, index barrel
- graph/builders/: dependency, structure, temporal, index barrel
- graph/classifiers/: index barrel
- graph/: index barrel
- domain/: queries barrel

Key type additions:
- GraphAdapter, Partition, DetectClustersResult interfaces for Leiden
- LockedDatabase type for advisory-locked DB instances
- DependencyGraphOptions, TemporalGraphOptions for graph builders
- Generic Statement<TRow> in vendor.d.ts for type-safe DB queries

Also fixes pre-existing type errors in module-map.ts (untyped prepare
calls) and generator.ts (null vs undefined argument).
…Script (Phase 5.4)

Migrate 24 JS files to strict TypeScript across three domains:

- domain/graph/builder/stages/ (9 files): all build pipeline stages
- domain/graph/builder/incremental.ts + cycles.ts + journal.ts + change-journal.ts
- domain/search/search/ (6 files): hybrid, semantic, keyword, filters, prepare, cli-formatter
- domain/search/stores/ (2 files): fts5, sqlite-blob
- domain/search/strategies/ (3 files): source, structured, text-utils

Key type fixes: ParseChange stat type alignment (mtimeMs→mtime conversion),
IncrementalStmts duck-typed interfaces for watcher compatibility, NodeWithId
relaxed to Pick<NodeRow> for partial node queries, ESCAPE clause backslash fix.

Passes tsc --noEmit with zero errors. No test regressions.
Remove cycles.js and sqlite-blob.js — their .ts replacements were
committed in the previous commit.
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 23, 2026

Greptile Summary

This PR migrates 24 JS files to strict TypeScript across the builder pipeline stages, search subsystem, and graph domain utilities as part of Phase 5.4. All corresponding .js files are deleted, keeping the codebase in a single-source state.

Key changes:

  • Stage pipeline typed end-to-end: collect-files, detect-changes, insert-nodes, build-edges, build-structure, parse-files, resolve-imports, run-analyses, and finalize all gain typed PipelineContext, typed helper interfaces, and explicit stage-boundary conversions.
  • ParseChange.stat type alignment: detect-changes.ts holds an internal FileStat with mtimeMs (matching fs.Stats) and explicitly converts to { mtime, size } before writing to ctx.parseChanges, ensuring downstream consumers (insert-nodes, healMetadata) receive the correct shape.
  • IncrementalStmts duck-typed interface: Added in incremental.ts to allow the watcher to pass its custom statement wrappers without requiring a BetterSQLite3 import in the watcher.
  • BuildGraphOpts extended: scope and skipRegistry were runtime-used but undeclared; they are now formally typed.
  • NodeRow & { id: number } in generator.ts: NodeRow already includes id: number, making the intersection redundant. Should be simplified to Array<NodeRow>.
  • CLAUDE.md architecture table: Still references domain/graph/cycles.js — this should be updated to cycles.ts following the migration.
  • diffSymbols cast in watcher.ts: Typed as (old: unknown[], new_: unknown[]) => unknown to match incremental.ts's interface, losing the concrete SymbolEntry signature. A shared type would be cleaner long-term.

Confidence Score: 4/5

  • Safe to merge — this is a type-only migration; runtime behaviour is unchanged and all 2032 tests pass.
  • The PR correctly performs all 24 JS→TS migrations, deletes stale counterparts, and the key type boundary fix (mtimeMsmtime) is sound. The three issues flagged are all style/P2: a redundant type intersection, a type-erasure cast, and imprecise null-to-undefined coercions. None affect correctness. The one documentation gap (CLAUDE.md still listing cycles.js) is minor.
  • src/domain/graph/watcher.ts (diffSymbols type erasure), src/domain/search/generator.ts (redundant NodeRow intersection), src/domain/graph/builder/stages/detect-changes.ts (null→undefined coercions)

Important Files Changed

Filename Overview
src/domain/graph/builder/stages/detect-changes.ts JS→TS migration with key fix: converts internal FileStat.mtimeMs to ParseChange.stat.mtime at stage boundary (line 434). Minor: nullundefined coercion via `as FileStat
src/domain/graph/builder/incremental.ts JS→TS migration introducing the IncrementalStmts duck-typed interface for watcher compatibility. Types look correct; the unknown[] signature for diffSymbols is intentional to accept both typed and untyped callers.
src/domain/graph/watcher.ts Updated to import IncrementalStmts and pass a fully-typed EngineOpts object with explicit dataflow: false, ast: false fields. diffSymbols is cast to unknown[] signature, losing type safety at the call site.
src/domain/search/generator.ts Changed SELECT * query result type from an inline partial shape to Array<NodeRow & { id: number }>. The & { id: number } is redundant since NodeRow already declares id.
src/domain/search/stores/fts5.ts JS→TS migration of sanitizeFtsQuery and hasFtsIndex. Logic unchanged; typed parameters added. hasFtsIndex safely returns false on any DB error.
src/domain/graph/builder/stages/insert-nodes.ts JS→TS migration. PrecomputedFileData.stat correctly uses { mtime: number; size: number } to match the converted shape produced by detect-changes.ts at the stage boundary.
src/types.ts Added scope?: string[] and skipRegistry?: boolean to BuildGraphOpts. Both fields were already used at runtime but missing from the type; this is a correct narrowing of the existing runtime behaviour.
src/domain/graph/cycles.ts New TS implementation replacing cycles.js. Clean rewrite delegating to buildDependencyGraph + Tarjan SCC; exports three typed functions. Old .js correctly deleted.
src/domain/graph/change-journal.ts JS→TS migration of change-event tracking. All interfaces well-typed; rotation logic and NDJSON append unchanged from the JS version.
src/domain/search/search/hybrid.ts JS→TS migration implementing RRF fusion of BM25 and semantic results. Type narrowing via inline FusionEntry and RankedItem interfaces is accurate.
src/domain/search/stores/sqlite-blob.ts Minimal JS→TS migration of cosineSim. The old .js was deleted; typed Float32Array parameters replace untyped arrays. Non-null assertions on indexed access are correct since the loop bound ensures values exist.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[collectFiles] --> B[detectChanges]
    B --> C{earlyExit?}
    C -- yes --> Z[done]
    C -- no --> D[parseFiles]
    D --> E[insertNodes]
    E --> F[resolveImports]
    F --> G[buildEdges]
    G --> H[buildStructure]
    H --> I[runAnalyses]
    I --> J[finalize]
    J --> Z

    subgraph "Type Boundary"
        B -- "ChangedFile.stat\n{mtimeMs}" --> B2["map → ParseChange.stat\n{mtime}"]
        B2 --> D
    end

    subgraph "Watcher path"
        W[watchProject] -- "stmts as IncrementalStmts\ndiffSymbols as unknown[]" --> INC[rebuildFile\nincremental.ts]
    end
Loading

Comments Outside Diff (3)

  1. src/domain/search/generator.ts, line 85 (link)

    Redundant & { id: number } intersection

    NodeRow (defined in src/types.ts line 100) already declares id: number as a required field, so NodeRow & { id: number } is identical to plain NodeRow. The PR description mentions "relaxed to Pick<NodeRow, ...>", but the code uses the full NodeRow with a redundant intersection instead.

  2. src/domain/graph/watcher.ts, line 108-117 (link)

    diffSymbols cast erases typed signature

    The diffSymbols function in change-journal.ts has a well-typed signature (oldSymbols: SymbolEntry[], newSymbols: SymbolEntry[]) => SymbolDiff, but here it is cast to (old: unknown[], new_: unknown[]) => unknown to satisfy the incremental.ts interface. This works at runtime but silently discards type information.

    The root cause is that rebuildFile's options.diffSymbols is typed as (old: unknown[], new_: unknown[]) => unknown in incremental.ts (line 516). Widening that parameter type to accept the concrete SymbolEntry shape — or using a shared type — would eliminate the need for this cast and keep call-site type checking intact.

  3. src/domain/graph/builder/stages/detect-changes.ts, line 177 (link)

    fileStat null→undefined via type assertion

    fileStat() is declared to return { mtimeMs: number; size: number } | null, but the cast as FileStat | undefined coerces the null return to undefined at the type level only — the actual runtime value remains null. This works correctly because the subsequent if (!stat) continue; guard is falsy for both null and undefined, but the assertion is technically imprecise.

    A more accurate cast would be as FileStat | null followed by the existing null check, keeping the TypeScript type aligned with what fileStat actually returns. This same pattern appears in tryJournalTier at line 131 and in mtimeAndHashTiers at line 201.

Reviews (2): Last reviewed commit: "fix: resolve merge conflicts with main" | Re-trigger Greptile

Impact: 2 functions changed, 3 affected
@carlos-alm
Copy link
Contributor Author

Resolved merge conflicts with main — Phase 5.5 work was already merged via PR #570, so main's version was kept for all overlapping files (leiden algorithms, builders/index.ts, vendor.d.ts, module-map.ts, ROADMAP.md). The PR's unique Phase 5.4 contributions (builder stages, search subsystem) are preserved.

Re: the P2 about hardcoded dataflow/ast in watcher.ts — this is intentional. Watch mode has never supported dataflow or AST analysis; the TS migration simply made the pre-existing implicit behavior explicit. Passing user config through would require supporting those features in watch mode first.

@greptileai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant