fix: free leaked WASM trees in native engine typeMap backfill#534
fix: free leaked WASM trees in native engine typeMap backfill#534carlos-alm merged 8 commits intomainfrom
Conversation
Three targeted fixes for the +28–56% query latency regression: 1. Pin benchmark hub target to stable function names (buildGraph, openDb, loadConfig) instead of auto-selecting the most-connected node. Barrel/type files becoming the hub made version-to-version comparison meaningless. 2. Gate implementors queries in bfsTransitiveCallers — check once whether the graph has any 'implements' edges before doing per-node findNodeById + findImplementors lookups. Skips all implementor overhead for codebases without interface/trait hierarchies. 3. Cache loadConfig() results per cwd. The config file is read from disk on every fnImpactData and diffImpactData call; caching eliminates redundant fs.existsSync + readFileSync + JSON.parse per query invocation. Impact: 5 functions changed, 123 affected
…handle Prevent callers from mutating the cached config object by returning a deep clone on cache hits. Add try/finally to selectTargets() so the database handle is closed even if a query throws. Impact: 2 functions changed, 1 affected
The embedding benchmark's npm mode installs codegraph into a temp dir,
but @huggingface/transformers is a devDependency and not included.
All 6 model workers crash on import, producing symbols: 0, models: {}.
Install it explicitly from the local devDependencies version, matching
the existing pattern for native platform packages. Also add a guard in
update-embedding-report.js to reject empty results and fail loudly
instead of silently overwriting valid benchmark data.
…gression Impact: 3 functions changed, 8 affected
The typeMap backfill path in parseFilesAuto and backfillTypeMap called wasmExtractSymbols but never freed the returned WASM tree objects. Over repeated builds (benchmarks, watch mode), hundreds of trees accumulated in WASM linear memory, eventually corrupting V8 state and crashing the native addon with ACCESS_VIOLATION / has_exception(). Two fixes: 1. Free WASM trees immediately after extracting typeMap data in both backfillTypeMap() and the parseFilesAuto() bulk backfill loop. 2. Skip backfill entirely for JS files — only TS/TSX have type annotations that WASM can extract. The native engine already handles JS `new Expr()` patterns, so re-parsing all JS files with WASM was pure waste. Closes #530 Impact: 2 functions changed, 2 affected
Greptile SummaryThis PR fixes a WASM linear memory leak in the native engine's typeMap backfill path:
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[parseFilesAuto / parseFileAuto] --> B{Native engine?}
B -- No --> C[WASM path: wasmExtractSymbols\nstores tree in symbols._tree]
B -- Yes --> D[native.parseFiles / parseFile]
D --> E{typeMap empty?}
E -- No --> F[Return result]
E -- Yes --> G{ext in .ts or .tsx?\nNEW: skip JS files}
G -- No --> F
G -- Yes --> H[backfillTypeMap]
H --> I[createParsers]
I --> J[wasmExtractSymbols]
J --> K{symbols?.typeMap?}
K -- No --> L[return typeMap empty]
K -- Yes --> M[build Map from typeMap]
M --> N[return typeMap, backfilled=true]
L --> O[finally: extracted?.tree?.delete\nNEW: always free WASM tree]
N --> O
O --> F
Last reviewed commit: "fix: address review ..." |
| const result = resolveSecrets(applyEnvOverrides(merged)); | ||
| _configCache.set(cwd, result); | ||
| return result; |
There was a problem hiding this comment.
Inconsistent return value on first vs. subsequent cache hits
On the first call for a given cwd, loadConfig returns the actual cached object stored in _configCache. All subsequent calls correctly return a structuredClone. If any caller mutates the returned config (e.g., patching llm.apiKey or similar), the cached entry is silently corrupted and all future callers receive the mutated data.
Both return sites after populating the cache should return a clone for consistency:
| const result = resolveSecrets(applyEnvOverrides(merged)); | |
| _configCache.set(cwd, result); | |
| return result; | |
| const result = resolveSecrets(applyEnvOverrides(merged)); | |
| _configCache.set(cwd, result); | |
| return structuredClone(result); |
There was a problem hiding this comment.
Addressed. The merge with main already brought in the structuredClone fix for both cache-population return sites (commit f8016c6 on main). The merged resolution preserves structuredClone(result) when storing to cache, so the first caller gets the raw object and the cache holds an isolated copy.
| const defaults = resolveSecrets(applyEnvOverrides({ ...DEFAULTS })); | ||
| _configCache.set(cwd, defaults); | ||
| return defaults; |
There was a problem hiding this comment.
Same raw-reference return for the defaults branch
The defaults code path has the same issue — the object stored in the cache is returned directly on the first call. This should also return a structuredClone to match the behaviour of the early-return cache hit path above:
| const defaults = resolveSecrets(applyEnvOverrides({ ...DEFAULTS })); | |
| _configCache.set(cwd, defaults); | |
| return defaults; | |
| const defaults = resolveSecrets(applyEnvOverrides({ ...DEFAULTS })); | |
| _configCache.set(cwd, defaults); | |
| return structuredClone(defaults); |
There was a problem hiding this comment.
Addressed. Same as above -- the merge with main brought in structuredClone(defaults) for this path as well.
The Rust edge builder only matched `kind == "class"` when looking up source nodes and targets for extends/implements edges. This caused all `impl Trait for Struct` relationships (and any non-class hierarchy) to be silently dropped — producing 0 implements edges for Rust sources while WASM correctly found 9. Align the three kind filter sets with the JS-side constants: - Source: class, struct, record, enum (was: class only) - Extends targets: class, struct, trait, record (was: class only) - Implements targets: interface, class, trait (was: interface, class) Fixes #530 (partial — implements parity gap) Impact: 1 functions changed, 0 affected
Impact: 1 functions changed, 0 affected
- Consolidate duplicated tree.delete() in backfillTypeMap into a single finally block, preventing future early-return paths from leaking trees - Skip WASM typeMap backfill for JS files in parseFileAuto and parseFileIncremental single-file paths, matching the bulk path behavior Impact: 3 functions changed, 2 affected
|
@greptileai re-review: merged origin/main to resolve config.js conflicts (structuredClone fix already on main), refactored backfillTypeMap to use a single finally block for tree cleanup, and added TS/TSX extension guard to parseFileAuto and parseFileIncremental single-file paths to skip unnecessary WASM parsing for JS files. |
Summary
backfillTypeMap()and theparseFilesAuto()bulk backfill loop — previously these trees were never freed, accumulating in WASM linear memory across repeated buildsnew Expr()patterns, so re-parsing all JS files with WASM was pure waste and the primary leak vector for JS-only codebases like codegraph itselfCloses #530
Root cause
When the native engine is used,
parseFilesAutocallsnative.parseFiles()then checks if any files need typeMap backfill via WASM. For every file wheretypeMapwas empty (all JS files — they have no type annotations), it calledwasmExtractSymbols()which returns a{ symbols, tree }object. Thetree(WASM tree-sitter Tree requiring explicit.delete()) was never freed. Over repeated builds in benchmarks, hundreds of leaked trees accumulated in WASM linear memory, eventually corrupting V8 internal state and crashing the native addon with ACCESS_VIOLATION (0xC0000005) or V8has_exception()assertion failures.Test plan
node scripts/benchmark.js— native worker should complete all iterations without crashingnode scripts/query-benchmark.js— native worker should complete without V8 assertion failure