fix(build): JSONC parse and version-aware incremental rebuilds#631
fix(build): JSONC parse and version-aware incremental rebuilds#631carlos-alm merged 4 commits intomainfrom
Conversation
|
Claude finished @carlos-alm's task —— View job Root Cause AnalysisProblem Being Solved: Two legitimate bugs in the build pipeline:
Why This Approach: Both fixes address root causes correctly:
Risk Assessment: ✅ Low Risk - These are defensive fixes that prevent subtle data corruption and parse failures. Technical Quality AssessmentFix 1: JSONC Comment Stripping (
|
Greptile SummaryThis PR resolves two independent bugs in the incremental build pipeline: a JSONC comment-stripping regex that corrupted Key changes:
Confidence Score: 5/5Safe to merge — both bugs are fixed correctly, all prior review concerns have been fully resolved, and a targeted test covers the JSONC regression. The two-pass comment-stripping race condition is closed by the unified alternation regex, which is logically correct and well-tested. The version-mismatch guard is minimal, follows the identical pattern of the existing engine/schema checks, and has no risk of incorrectly skipping a full rebuild. The shared No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[codegraph build --incremental] --> B[checkEngineSchemaMismatch]
B --> C{engine changed?}
C -- yes --> D[forceFullRebuild = true]
C -- no --> E{schema_version changed?}
D --> E
E -- yes --> F[forceFullRebuild = true]
E -- no --> G{codegraph_version changed? NEW}
F --> G
G -- yes --> H[forceFullRebuild = true]
G -- no --> I[Incremental build proceeds]
H --> I
I --> J[finalize: setBuildMeta stores codegraph_version]
K[loadPathAliases] --> L[Read tsconfig.json / jsconfig.json]
L --> M[Single-pass regex - strings first then comments - NEW]
M --> N[Strip trailing commas]
N --> O[JSON.parse]
Reviews (2): Last reviewed commit: "fix: combine JSONC comment stripping int..." | Re-trigger Greptile |
src/domain/graph/builder/pipeline.ts
Outdated
| const CODEGRAPH_VERSION = ( | ||
| JSON.parse( | ||
| fs.readFileSync(path.join(__pipelineDir, '..', '..', '..', '..', 'package.json'), 'utf-8'), | ||
| ) as { version: string } | ||
| ).version; |
There was a problem hiding this comment.
Duplicated
CODEGRAPH_VERSION constant
CODEGRAPH_VERSION is now defined in two places — once here in pipeline.ts (using ../../../../package.json, 4 levels up) and once in stages/finalize.ts (using ../../../../../package.json, 5 levels up). Both resolve to the same root package.json, so there's no correctness bug today, but this duplication is a latent maintenance risk: if the build output directory structure ever changes, both files would need updating independently and a mismatch would be a silent bug (version stored by finalize.ts ≠ version read back by pipeline.ts).
Consider extracting CODEGRAPH_VERSION into a shared module (e.g., src/shared/version.ts) and importing it from both pipeline.ts and finalize.ts, similar to how other shared constants are handled in src/shared/constants.ts.
There was a problem hiding this comment.
Fixed in 1a79a36. Extracted CODEGRAPH_VERSION to src/shared/version.ts and updated both pipeline.ts and finalize.ts to import from the shared module. This eliminates the duplicate package.json reading and the fragile relative path differences.
There was a problem hiding this comment.
Fixed in 1a79a36. Extracted CODEGRAPH_VERSION to src/shared/version.ts and updated both pipeline.ts and finalize.ts to import from the shared module. This eliminates the duplicate package.json reading and the fragile relative path differences.
src/domain/graph/builder/helpers.ts
Outdated
| .replace(/\/\*[\s\S]*?\*\//g, '') | ||
| .replace(/("(?:[^"\\]|\\.)*")|\/\/.*$/gm, (_, str) => str ?? '') |
There was a problem hiding this comment.
Block comment stripping still runs before string protection
The // stripping is now string-aware (step 2), but the block comment regex on line 135 runs first, before any string values are protected. A tsconfig value like "value": "check /* pattern */ end" would have its content corrupted before the string-aware step 2 can preserve it. This is a pre-existing issue — not introduced by this PR — but since comment handling is being improved here, it's worth noting as a natural follow-up.
The complete fix would interleave string matching into the block comment pass as well:
// Replace both block and line comments while preserving quoted strings
.replace(/("(?:[^"\\]|\\.)*")|\/\*[\s\S]*?\*\/|\/\/.*$/gm, (_, str) => str ?? '')This single regex handles all three cases — quoted strings (preserve), block comments (remove), and line comments (remove) — in a single pass, with string matching taking priority via left-to-right alternation.
There was a problem hiding this comment.
Fixed in 134c829. Merged block comment, line comment, and string preservation into a single regex pass. Left-to-right alternation ensures quoted strings match first and are preserved, while both block and line comments outside strings are stripped in one pass. This closes the pre-existing gap where block comments inside string values would have been corrupted.
Eliminates duplicate version reading from package.json in pipeline.ts and finalize.ts. Both now import from a single shared module, removing the maintenance risk of mismatched relative paths.
…631) Merges block comment, line comment, and string preservation into one regex pass so block comments inside string values are also protected.
…remental rebuilds The // comment regex in loadPathAliases stripped // inside JSON string values (e.g. https:// URLs), causing "Bad control character" parse failures. Fix: match quoted strings first and preserve them, only stripping // outside strings. Add codegraph_version check to checkEngineSchemaMismatch so incremental builds auto-promote to full rebuild when the codegraph version changes, preventing stale graph data after upgrades.
Eliminates duplicate version reading from package.json in pipeline.ts and finalize.ts. Both now import from a single shared module, removing the maintenance risk of mismatched relative paths.
…631) Merges block comment, line comment, and string preservation into one regex pass so block comments inside string values are also protected.
134c829 to
d4a6756
Compare
Summary
//comment-stripping regex inloadPathAliasesalso matched//inside JSON string values (e.g.https://json-schema.org/...), producingBad control character in string literalon every build. Fixed by matching quoted strings first and preserving them — only//outside strings is stripped.codegraph_version. Upgrading codegraph (e.g. 3.3.1 → 3.4.0) with improved extractors silently used stale data (14% edge gap). Fixed by adding acodegraph_versioncheck that auto-promotes to full rebuild when the version changes.Test plan
$schemaURL containing//alongside comments parses correctlynpx biome checkclean on all changed filescodegraph buildno longer warnsFailed to parse tsconfig.jsonon repos with JSONC tsconfigCodegraph version changed (...), promoting to full rebuild