perf: sub-100ms incremental rebuilds (466ms → 67-80ms) by carlos-alm · Pull Request #644 · optave/codegraph

carlos-alm · 2026-03-27T03:32:55Z

Summary

Optimizes 1-file incremental rebuild performance on 473-file codegraph codebase from ~466ms baseline down to 67-80ms wall-clock time.

Optimizations applied (in order of impact):

Deferred DB close (finalize.ts, connection.ts): WAL checkpoint in db.close() costs ~250ms on Windows NTFS. For small incremental builds (≤5 files), defer close to next event loop tick via setImmediate. Auto-flushed on next openDb().
Scoped barrel re-parsing (resolve-imports.ts): For ≤5 changed files, only re-parse barrel files related to changes (imported by or re-exporting from changed files) instead of all barrels. Batch-parse in one call.
Structure fast path (build-structure.ts): For ≤5 changed files on large codebases (>20 files), use targeted per-file SQL queries instead of loading ALL definitions from DB.
Skip finalize overhead (finalize.ts): Skip drift detection (≤3 files), build metadata persistence (≤5 files), and auto-registration (incremental builds).
DB-cached file list (collect-files.ts): Reconstruct allFiles from DB file_hashes + journal deltas instead of full recursive filesystem scan (~7ms savings).
Scoped node loading (build-edges.ts): Load nodes only from relevant files (changed + import targets) with lazy SQL fallback for global lookups (~5ms savings, 6K→400 nodes).

Benchmark (5 runs, 1-file change on 473-file codebase):

Run 1: 76.5ms  Run 2: 75.0ms  Run 3: 80.2ms  Run 4: 67.6ms  Run 5: 69.4ms

Test plan

All 2129 tests pass
Lint clean (1 pre-existing warning)
Incremental edge parity test passes (verifies correctness of scoped loading)
Benchmarked on codegraph self-analysis (473 files)
Deferred close handles EBUSY in test temp dirs (uses sync close for tmpdir)

… compat 9.1 — Warn on graph load when DB was built with a different codegraph version. The check runs once per process in openReadonlyOrFail() and suggests `build --no-incremental`. 9.2 — Barrel-only files now emit reexport edges during build. Previously the entire file was skipped in buildImportEdges; now only non-reexport imports are skipped, so `codegraph exports` can follow re-export chains. 9.3 — Demote "Failed to parse tsconfig.json" from warn to debug level so it no longer clutters every build output. 9.4 — Document EXTENSIONS/IGNORE_DIRS Array→Set breaking change in CHANGELOG. Add .toArray() convenience method and export ArrayCompatSet type for consumers migrating from the pre-3.4 array API.

…ge duplication (#634) - Move _versionWarned flag outside mismatch conditional to avoid redundant build_meta queries when versions match. - Wrap SUPPORTED_EXTENSIONS in new Set() to avoid mutating the sibling module's export. - Delete outgoing edges for barrel-only files before re-adding them to fileSymbols during incremental builds, preventing duplicate reexport edges.

@ts-expect-error

Delete the 39-LOC manual ambient type declarations for better-sqlite3 and use the community @types/better-sqlite3 package instead. The vendor file was a migration-era shim (allowJs is long gone from tsconfig). - Replace all BetterSqlite3.Database → BetterSqlite3Database (types.ts) - Replace all BetterSqlite3.Statement → SqliteStatement (types.ts) - Simplify constructor casts in connection.ts, branch-compare.ts, snapshot.ts (no longer needed with proper @types) - Clean up watcher.ts double-cast and info.ts @ts-expect-error - Widen transaction() return type for @types compatibility

- Restore warn level for tsconfig/jsconfig parse errors (P1: was incorrectly downgraded to debug; ENOENT is already guarded by existsSync before the try block) - Simplify openReadonlyOrFail constructor cast to match openDb pattern (P2) - Use Object.assign in withArrayCompat instead of cast-then-mutate (P2) - Remove unused BetterSqlite3Database import from branch-compare.ts - Remove stale biome-ignore suppression from snapshot.ts

Four optimizations for small incremental builds (≤5 changed files): 1. Scope barrel re-parsing to related barrels only (resolve-imports.ts) Instead of parsing ALL barrel files one-by-one (~93ms), only re-parse barrels imported by or re-exporting from changed files, batch-parsed in one call (~11ms). 2. Fast-path structure metrics (build-structure.ts) For ≤5 changed files on large codebases (>20 files), use targeted per-file SQL queries (~2ms) instead of loading ALL definitions from DB and recomputing ALL metrics (~35ms). 3. Skip unnecessary finalize work (finalize.ts) - Skip setBuildMeta writes for ≤5 files (avoids WAL transaction) - Skip drift detection for ≤3 files - Skip auto-registration dynamic import for incremental builds - Move timing measurement before db.close() 4. Deferred db.close() for small incremental builds (connection.ts) WAL checkpoint in db.close() costs ~250ms on Windows NTFS. Defer to next event loop tick so buildGraph() returns immediately. Includes flushDeferredClose() for test compatibility and auto-flush on openDb().

Two optimizations for small incremental builds (≤5 changed files): 1. collectFiles: reconstruct file list from DB file_hashes + journal deltas instead of full recursive filesystem scan (~7ms savings) 2. buildEdges: scope node loading query to only relevant files (changed files + import targets) with lazy SQL fallback for global name lookups (~5ms savings on 6K→400 nodes) Combined improvement: 78-90ms → 67-80ms for 1-file incremental on 473-file codebase.

carlos-alm · 2026-03-27T03:33:05Z

@greptileai

claude · 2026-03-27T03:33:06Z

Claude finished @carlos-alm's task —— View job

PR Review Analysis

Todo List

Phase 1: Root Cause Analysis
Phase 2: Configuration Change Detection
Phase 3: Test Integrity Check
Phase 4: Code Quality Review
Phase 5: Backlog Compliance Assessment
Final Assessment & Rating

Phase 1: Root Cause Analysis ✅

Problem Being Solved: Incremental rebuild performance bottleneck (466ms → 67-80ms). The PR addresses a real performance issue where small incremental changes trigger expensive operations that don't scale with change size.

Why This Approach: The optimizations are surgical and well-justified:

Deferred DB close avoids expensive WAL checkpoint for small builds
Scoped barrel re-parsing only processes relevant barrel files instead of all
Structure fast path uses targeted SQL vs loading all definitions
Skip finalize overhead removes unnecessary operations for small changes
DB-cached file list avoids filesystem scan
Scoped node loading only loads relevant nodes

Risk Assessment: LOW - All optimizations include proper fallback mechanisms and maintain correctness guarantees.

Phase 2: Configuration Change Detection ✅

No High-Risk Configuration Changes Detected:

Biome configuration unchanged (only 2 existing rule relaxations for noNonNullAssertion and noExplicitAny)
No test timeout increases or coverage threshold reductions
No TypeScript strict mode changes
Build configuration remains intact

Phase 3: Test Integrity Check ✅

No Red Flags Found:

No test skips (.skip()) or .only() modifications
No weakened assertions or conditional test logic
Incremental edge parity test exists and maintains correctness verification
All 2129 tests reported passing per PR description
No test files modified alongside implementation (clean separation)

Phase 4: Code Quality Review ✅

Strong Implementation Quality:

Proper guard conditions: Each optimization has appropriate thresholds (≤5 files, >20 existing files, etc.)
Fallback mechanisms: Scoped loading includes lazy SQL fallback for global lookups (addLazyFallback)
Edge case handling: Temp directory detection for test safety (isTempDir check)
Transaction safety: Barrel edge deletion happens atomically inside DB transaction
Performance measurement: All optimizations include proper timing instrumentation
Clean abstraction: New helper functions are well-isolated and testable

No Security/Correctness Concerns:

SQL injection: All queries use proper prepared statements
DB integrity: WAL checkpoint deferral maintains ACID properties
Error handling: Try/catch blocks with graceful degradation

Phase 5: Backlog Compliance Assessment ✅

Backlog Compliance

Zero-dep: ✓ — No new runtime dependencies added, builds on existing SQLite/tree-sitter infrastructure
Foundation-aligned: ✓ — Fully compliant with all FOUNDATION.md principles (local-first, zero-infrastructure, analysis-only)
Problem-fit: 5 — Directly addresses core AI agent problem by dramatically reducing rebuild latency that blocks agent workflows
Breaking: No — All changes are purely internal optimizations with no API/CLI contract changes
Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Critical Concerns

None identified. This is a high-quality performance optimization that maintains correctness while delivering substantial user value.

Final Recommendation

Rating: ⭐⭐⭐⭐⭐ (5/5)
Action: APPROVE
Reasoning: Exceptional implementation that solves a real performance bottleneck with surgical optimizations, comprehensive safety mechanisms, and verified correctness. The 85% latency reduction (466ms → 67-80ms) directly improves developer experience for incremental workflows. Code quality is exemplary with proper guard conditions, fallback mechanisms, and test coverage.

greptile-apps · 2026-03-27T03:41:16Z

Greptile Summary

This PR delivers a focused set of incremental-rebuild performance optimizations, reducing 1-file build time from ~466 ms to 67–80 ms on a 473-file codebase. The changes touch three pipeline stages:\n\n- build-edges.ts: Introduces loadNodes() returning a {rows, scoped} tuple so the scoped-load flag is derived from whether the fast SQL path actually ran, rather than a separate (previously misaligned) threshold check. A lazy SQL fallback is patched onto ctx.nodesByName.get to handle global name lookups outside the scoped set.\n- collect-files.ts: Adds tryFastCollect() which reconstructs allFiles from the file_hashes DB table plus journal deltas, skipping the recursive filesystem scan (~8 ms saved). The hasEntries guard ensures the fast path is only taken when the watcher was demonstrably active.\n- finalize.ts: Fixes a latent bug where setBuildMeta was skipped for 4–5 file builds (> 5 threshold) while drift detection still ran (> 3 threshold), leaving it comparing against stale node/edge counts. Threshold corrected to > 3. Also removes a dynamic import('node:os') in favour of the static top-level import already present.\n\nKey observations:\n- The previously flagged scopedLoad/existingFileCount misalignment is fully resolved by the tuple refactor.\n- The finalize.ts threshold correction is a real correctness fix, not just a style tweak.\n- tryFastCollect can include a path deleted between the journal header write and the build start, but detect-changes handles this gracefully as a removal.

Confidence Score: 5/5

Safe to merge — no blocking logic errors; prior scoping-alignment concern is resolved and a latent drift-detection bug is fixed.

All 2129 tests pass, the previous P0 scopedLoad/threshold misalignment is resolved via the tuple refactor, the finalize.ts threshold correction fixes a real (low-severity) stale-count bug, and the remaining comments are non-blocking style suggestions. Fast paths are guarded by clear conditions that fall back safely on any edge case.

No files require special attention; build-edges.ts monkey-patch is worth a follow-up refactor but does not block merge.

Important Files Changed

Filename	Overview
src/domain/graph/builder/stages/build-edges.ts	Adds scoped node loading via `loadNodes()` (returns `{rows, scoped}` tuple) and a lazy SQL fallback for global lookups. Prior `scopedLoad`/`existingFileCount` misalignment resolved by deriving `scoped` directly from which code path executed. Minor style concern with instance-level Map.get monkey-patching.
src/domain/graph/builder/stages/collect-files.ts	New `tryFastCollect()` reconstructs `allFiles` from `file_hashes` + journal deltas instead of a recursive filesystem scan. Logic is correct: relative paths from DB and journal are applied to a Set then converted to absolute. The `hasEntries` guard prevents misuse when no watcher was running.
src/domain/graph/builder/stages/finalize.ts	Threshold for `setBuildMeta` corrected from `> 5` to `> 3` to align with the drift-detection gate — previously counts could go stale for 4–5 file builds. `tmpdir` import promoted to module-level; deferred-close correctly excluded from full-build and temp-dir paths.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[collectFiles stage] --> B{ctx.incremental &&
!forceFullRebuild?}
    B -- No --> C[collectFilesUtil
full filesystem scan]
    B -- Yes --> D[tryFastCollect]
    D --> E{file_hashes
has entries?}
    E -- No --> C
    E -- Yes --> F{journal valid
& has entries?}
    F -- No --> C
    F -- Yes --> G[Load relative paths
from file_hashes]
    G --> H[Apply journal deltas
+changed / -removed]
    H --> I[Convert to absolute
paths → allFiles]
    J[buildEdges stage] --> K[loadNodes ctx]
    K --> L{!isFullBuild &&
fileSymbols.size <= 5?}
    L -- No --> M[SELECT all nodes
scoped=false]
    L -- Yes --> N{existingFileCount
> 20?}
    N -- No --> M
    N -- Yes --> O[Build relevantFiles
changed + imports + barrels]
    O --> P[SELECT nodes WHERE
file IN relevantFiles
scoped=true]
    P --> Q[addLazyFallback
patch nodesByName.get]
    M --> R[buildEdgesTx]
    Q --> R

_{Reviews (2): Last reviewed commit: "Merge branch 'feat/dogfood-fixes-9.1-9.4..." | Re-trigger Greptile}

greptile-apps · 2026-03-27T03:41:20Z

src/domain/graph/builder/stages/build-edges.ts

+  const allNodesBefore = loadNodes(ctx);
+  const scopedLoad = !ctx.isFullBuild && ctx.fileSymbols.size <= 5 && allNodesBefore.length < 5000;
+  setupNodeLookups(ctx, allNodesBefore);
+  addLazyFallback(ctx, scopedLoad);


Inconsistent scoping condition between loadNodes and scopedLoad

loadNodes decides to scope the query when existingFileCount > 20 (line 579), meaning it only loads nodes from relevant files. However, scopedLoad — which controls whether the lazy SQL fallback is applied — uses a different gate: allNodesBefore.length < 5000 (line 636).

If existingFileCount > 20 (scoped load executes) but the scoped result happens to return ≥5000 nodes, scopedLoad is false and addLazyFallback is never called. In that state, nodesByName contains only the scoped subset but has no lazy fallback for global name lookups (resolveByMethodOrGlobal, supplementReceiverEdges). Nodes outside the scoped set become silently invisible, potentially dropping call/usage edges.

The two gates should align:

const scopedLoad = !ctx.isFullBuild && ctx.fileSymbols.size <= 5 && allNodesBefore.length < 5000 && existingFileCount > 20;

In practice this is unlikely to fire (≥5000 nodes from ≤5 files + their direct imports is extreme), but the asymmetry is a latent correctness risk worth closing.

Fixed — refactored loadNodes to return a { rows, scoped } tuple so the scopedLoad flag is derived directly from whether the scoped SQL path actually executed, rather than re-deriving it with a separate (misaligned) threshold. This eliminates the latent correctness risk entirely.

…lback (#644) Return scoped flag directly from loadNodes instead of re-deriving it with a different threshold, ensuring addLazyFallback is always called when the query was actually scoped.

…e tmpdir shadow (#644) - Lower setBuildMeta gate from >5 to >3 to match drift detection gate, preventing stale baseline counts after series of 4-5 file builds. - Remove redundant dynamic import of tmpdir (already imported statically). - Add comment clarifying finalizeMs metric placement.

…e/codegraph into feat/dogfood-fixes-9.1-9.4

carlos-alm · 2026-03-27T03:46:38Z

Addressed all Greptile feedback in two commits:

1. Inconsistent scoping gate in build-edges.ts (inline comment)
Refactored loadNodes to return { rows, scoped } so scopedLoad is derived directly from whether the scoped SQL path actually executed. No more separate threshold that could diverge.

2. Stale drift detection baseline in finalize.ts
Lowered the setBuildMeta gate from > 5 to > 3 to match the drift detection gate (allSymbols.size > 3). This ensures stored node_count/edge_count are refreshed whenever drift detection reads them.

3. tmpdir shadow variable in finalize.ts
Removed the redundant const { tmpdir } = await import('node:os') since tmpdir is already imported statically at the top of the file.

4. finalizeMs metric placement in finalize.ts
Added a clarifying comment explaining the metric intentionally excludes closeDb and post-close operations (the deferred close is async, and DB close cost is tracked separately).

All 2129 tests pass, lint clean (1 pre-existing warning).

carlos-alm · 2026-03-27T03:46:49Z

Addressed all Greptile feedback in two commits:

1. Inconsistent scoping gate in build-edges.ts (inline comment)
Refactored loadNodes to return { rows, scoped } so scopedLoad is derived directly from whether the scoped SQL path actually executed. No more separate threshold that could diverge.

2. Stale drift detection baseline in finalize.ts
Lowered the setBuildMeta gate from > 5 to > 3 to match the drift detection gate (allSymbols.size > 3). This ensures stored node_count/edge_count are refreshed whenever drift detection reads them.

3. tmpdir shadow variable in finalize.ts
Removed the redundant dynamic import of tmpdir since it is already imported statically at the top of the file.

4. finalizeMs metric placement in finalize.ts
Added a clarifying comment explaining the metric intentionally excludes closeDb and post-close operations (the deferred close is async, and DB close cost is tracked separately).

All 2129 tests pass, lint clean (1 pre-existing warning).

carlos-alm · 2026-03-27T03:46:58Z

@greptileai

carlos-alm added 11 commits March 26, 2026 02:55

docs: soften EXTENSIONS/IGNORE_DIRS changelog wording

8e43e43

fix: resolve merge conflicts with main

5eeb0dc

Merge branch 'main' into feat/dogfood-fixes-9.1-9.4

e3eb0e5

fix: preserve transaction argument types via inline inference (#640)

c3ccbdd

Merge branch 'main' into feat/dogfood-fixes-9.1-9.4

4eceebd

carlos-alm added 2 commits March 26, 2026 21:37

Merge branch 'main' into feat/dogfood-fixes-9.1-9.4

a6d8080

Merge branch 'main' into feat/dogfood-fixes-9.1-9.4

95626eb

greptile-apps bot reviewed Mar 27, 2026

View reviewed changes

carlos-alm added 4 commits March 26, 2026 21:44

Merge branch 'main' into feat/dogfood-fixes-9.1-9.4

7dd0fd6

fix: align scopedLoad gate with loadNodes to prevent missing lazy fal…

0878fd1

…lback (#644) Return scoped flag directly from loadNodes instead of re-deriving it with a different threshold, ensuring addLazyFallback is always called when the query was actually scoped.

Merge branch 'feat/dogfood-fixes-9.1-9.4' of https://github.com/optav…

eae0280

…e/codegraph into feat/dogfood-fixes-9.1-9.4

Merge branch 'main' into feat/dogfood-fixes-9.1-9.4

4b633be

carlos-alm merged commit f6bd74e into main Mar 27, 2026
12 checks passed

carlos-alm deleted the feat/dogfood-fixes-9.1-9.4 branch March 27, 2026 04:08

github-actions bot locked and limited conversation to collaborators Mar 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: sub-100ms incremental rebuilds (466ms → 67-80ms)#644

perf: sub-100ms incremental rebuilds (466ms → 67-80ms)#644
carlos-alm merged 18 commits intomainfrom
feat/dogfood-fixes-9.1-9.4

carlos-alm commented Mar 27, 2026

Uh oh!

carlos-alm commented Mar 27, 2026

Uh oh!

claude bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

greptile-apps bot Mar 27, 2026

Uh oh!

carlos-alm Mar 27, 2026

Uh oh!

carlos-alm commented Mar 27, 2026

Uh oh!

carlos-alm commented Mar 27, 2026

Uh oh!

carlos-alm commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented Mar 27, 2026

Summary

Optimizations applied (in order of impact):

Benchmark (5 runs, 1-file change on 473-file codebase):

Test plan

Uh oh!

carlos-alm commented Mar 27, 2026

Uh oh!

claude bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Analysis

Todo List

Phase 1: Root Cause Analysis ✅

Phase 2: Configuration Change Detection ✅

Phase 3: Test Integrity Check ✅

Phase 4: Code Quality Review ✅

Phase 5: Backlog Compliance Assessment ✅

Backlog Compliance

Critical Concerns

Final Recommendation

Uh oh!

greptile-apps bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 27, 2026

Uh oh!

carlos-alm commented Mar 27, 2026

Uh oh!

carlos-alm commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Mar 27, 2026 •

edited

Loading

greptile-apps bot commented Mar 27, 2026 •

edited

Loading