Skip to content

fix(build): JSONC parse and version-aware incremental rebuilds#631

Merged
carlos-alm merged 4 commits intomainfrom
fix/tsconfig-parse-and-version-check
Mar 26, 2026
Merged

fix(build): JSONC parse and version-aware incremental rebuilds#631
carlos-alm merged 4 commits intomainfrom
fix/tsconfig-parse-and-version-check

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • BUG 1 — tsconfig.json parse failure: The // comment-stripping regex in loadPathAliases also matched // inside JSON string values (e.g. https://json-schema.org/...), producing Bad control character in string literal on every build. Fixed by matching quoted strings first and preserving them — only // outside strings is stripped.
  • BUG 2 — No version mismatch warning: Incremental builds checked engine and schema version but not codegraph_version. Upgrading codegraph (e.g. 3.3.1 → 3.4.0) with improved extractors silently used stale data (14% edge gap). Fixed by adding a codegraph_version check that auto-promotes to full rebuild when the version changes.

Test plan

  • Existing JSONC test still passes
  • New test: tsconfig with $schema URL containing // alongside comments parses correctly
  • npx biome check clean on all changed files
  • Manual: codegraph build no longer warns Failed to parse tsconfig.json on repos with JSONC tsconfig
  • Manual: After version upgrade, incremental build logs Codegraph version changed (...), promoting to full rebuild

@claude
Copy link

claude bot commented Mar 26, 2026

Claude finished @carlos-alm's task —— View job


Root Cause Analysis

Problem Being Solved: Two legitimate bugs in the build pipeline:

  1. JSONC Parse Failure: The comment-stripping regex \/\/.*$ was matching // inside JSON string values (e.g., "https://json-schema.org/..."), causing Bad control character in string literal errors during tsconfig.json parsing
  2. Stale Incremental Data: After codegraph upgrades, incremental builds checked engine/schema versions but ignored codegraph_version, silently using outdated extraction results (14% edge gap reported)

Why This Approach: Both fixes address root causes correctly:

  • The regex fix uses proper string-aware matching to preserve URLs while stripping comments
  • The version check follows the existing pattern for engine/schema checks, triggering full rebuilds when extraction logic changes

Risk Assessment: ✅ Low Risk - These are defensive fixes that prevent subtle data corruption and parse failures.

Technical Quality Assessment

Fix 1: JSONC Comment Stripping (helpers.ts:136)

- .replace(/\/\/.*$/gm, '')
+ .replace(/("(?:[^"\\]|\\.)*")|\/\/.*$/gm, (_, str) => str ?? '')

Correct approach: Uses capture groups to match quoted strings first, preserving them while stripping line comments. The regex "(?:[^"\\]|\\.)*" properly handles escaped quotes.

Fix 2: Version Check (pipeline.ts:68-74)

const prevVersion = getBuildMeta(db, 'codegraph_version');
if (prevVersion && prevVersion !== CODEGRAPH_VERSION) {
  info(`Codegraph version changed (${prevVersion}${CODEGRAPH_VERSION}), promoting to full rebuild.`);
  ctx.forceFullRebuild = true;
}

Consistent pattern: Follows identical logic to engine/schema version checks. The codegraph_version is stored during finalization (confirmed in finalize.ts:76).

Test Coverage (builder.test.ts:203-223)

Appropriate test: Covers the specific case that was failing - JSONC with $schema URLs containing // alongside comments. Tests both parsing success and alias extraction correctness.

Configuration Change Detection

No risky configuration changes detected - No linting rules relaxed, no test thresholds lowered, no build config modifications that hide issues.

Test Integrity

Test changes are appropriate - Single focused test addition that directly validates the bug fix. No weakened assertions or removed validations.

Backlog Compliance

  • Zero-dep: ✅ No new runtime dependencies
  • Foundation-aligned: ✅ Improves core parsing reliability, no principle violations
  • Problem-fit: 4/5 - Prevents build failures and data staleness that would confuse AI agents with wrong dependency info
  • Breaking: No - Purely additive fixes with backward compatibility
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Code Quality Issues

Minor Concerns:

  1. Duplicate Version Reading: Both pipeline.ts:30-34 and finalize.ts:15-19 read package.json for version. Consider extracting to shared constant.
  2. Path Construction: The __pipelineDir path manipulation could be fragile across different environments.

No Security/Performance Issues Detected

Final Assessment

## Root Cause Analysis  
**Problem Being Solved**: Two legitimate build pipeline bugs - JSONC parse failures from over-aggressive comment regex, and stale incremental data after version upgrades
**Why This Approach**: Direct fixes at root cause level using established patterns in codebase
**Risk Assessment**: Low risk, defensive improvements

## Backlog Compliance
- Zero-dep: ✅ No new dependencies  
- Foundation-aligned: ✅ Improves core parsing reliability
- Problem-fit: 4/5 - Prevents build failures that confuse AI agents
- Breaking: No - Backward compatible fixes
- Tier: 1

## Critical Concerns
- Minor: Duplicate package.json version reading logic could be consolidated
- All changes address legitimate bugs at root cause level

## Final Recommendation
- Rating: ⭐⭐⭐⭐☆ (4/5)
- Action: **APPROVE**
- Reasoning: Well-targeted bug fixes with appropriate tests, no configuration hiding, follows established patterns. Minor code duplication doesn't warrant blocking.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 26, 2026

Greptile Summary

This PR resolves two independent bugs in the incremental build pipeline: a JSONC comment-stripping regex that corrupted // inside JSON string values (e.g. $schema URLs), and a missing codegraph_version check that caused incremental builds to silently reuse stale graph data after a version upgrade. It also extracts CODEGRAPH_VERSION into a shared module (src/shared/version.ts), addressing the duplication concern raised in the previous review cycle.

Key changes:

  • helpers.ts: Single-pass regex ("(?:[^"\\]|\\.)*")|\/\*[\s\S]*?\*\/|\/\/.*$ replaces the prior two-pass approach; string literals take priority via left-to-right alternation, so URLs containing // or values containing /* ... */ are preserved faithfully before any comment is stripped.
  • pipeline.ts: checkEngineSchemaMismatch now includes a codegraph_version guard — when the stored version differs from the running binary, an incremental build is promoted to a full rebuild. The guard follows the same conditional pattern as the existing engine and schema-version checks.
  • src/shared/version.ts: New module that reads package.json once at load time and exports CODEGRAPH_VERSION. Both pipeline.ts and finalize.ts now import from here, eliminating the fragile multi-level relative paths used previously.
  • tests/unit/builder.test.ts: New test exercises the exact failure scenario (JSONC with a $schema URL plus both comment styles), confirming that baseUrl and path aliases are still extracted correctly.

Confidence Score: 5/5

Safe to merge — both bugs are fixed correctly, all prior review concerns have been fully resolved, and a targeted test covers the JSONC regression.

The two-pass comment-stripping race condition is closed by the unified alternation regex, which is logically correct and well-tested. The version-mismatch guard is minimal, follows the identical pattern of the existing engine/schema checks, and has no risk of incorrectly skipping a full rebuild. The shared version.ts module correctly resolves the package.json path for both compiled and source execution contexts. No critical or blocking issues remain.

No files require special attention.

Important Files Changed

Filename Overview
src/domain/graph/builder/helpers.ts Unified JSONC comment-stripping into a single regex pass; quoted strings are preserved via left-to-right alternation, fixing // inside URLs corrupting JSON.parse.
src/shared/version.ts New shared module that reads CODEGRAPH_VERSION once from package.json at module load time, eliminating the duplicated and fragile relative-path reads in pipeline.ts and finalize.ts.
src/domain/graph/builder/pipeline.ts Added codegraph_version mismatch check to checkEngineSchemaMismatch, promoting to full rebuild when the stored version differs from the current binary — mirrors existing engine/schema checks.
src/domain/graph/builder/stages/finalize.ts Removed local CODEGRAPH_VERSION constant (and fragile 5-level-up package.json read); now imports from src/shared/version.ts. No behavioural change.
tests/unit/builder.test.ts Added test verifying that $schema URLs containing // alongside both // and /* */ style comments are parsed correctly by loadPathAliases.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[codegraph build --incremental] --> B[checkEngineSchemaMismatch]
    B --> C{engine changed?}
    C -- yes --> D[forceFullRebuild = true]
    C -- no --> E{schema_version changed?}
    D --> E
    E -- yes --> F[forceFullRebuild = true]
    E -- no --> G{codegraph_version changed? NEW}
    F --> G
    G -- yes --> H[forceFullRebuild = true]
    G -- no --> I[Incremental build proceeds]
    H --> I
    I --> J[finalize: setBuildMeta stores codegraph_version]

    K[loadPathAliases] --> L[Read tsconfig.json / jsconfig.json]
    L --> M[Single-pass regex - strings first then comments - NEW]
    M --> N[Strip trailing commas]
    N --> O[JSON.parse]
Loading

Reviews (2): Last reviewed commit: "fix: combine JSONC comment stripping int..." | Re-trigger Greptile

Comment on lines +30 to +34
const CODEGRAPH_VERSION = (
JSON.parse(
fs.readFileSync(path.join(__pipelineDir, '..', '..', '..', '..', 'package.json'), 'utf-8'),
) as { version: string }
).version;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Duplicated CODEGRAPH_VERSION constant

CODEGRAPH_VERSION is now defined in two places — once here in pipeline.ts (using ../../../../package.json, 4 levels up) and once in stages/finalize.ts (using ../../../../../package.json, 5 levels up). Both resolve to the same root package.json, so there's no correctness bug today, but this duplication is a latent maintenance risk: if the build output directory structure ever changes, both files would need updating independently and a mismatch would be a silent bug (version stored by finalize.ts ≠ version read back by pipeline.ts).

Consider extracting CODEGRAPH_VERSION into a shared module (e.g., src/shared/version.ts) and importing it from both pipeline.ts and finalize.ts, similar to how other shared constants are handled in src/shared/constants.ts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1a79a36. Extracted CODEGRAPH_VERSION to src/shared/version.ts and updated both pipeline.ts and finalize.ts to import from the shared module. This eliminates the duplicate package.json reading and the fragile relative path differences.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1a79a36. Extracted CODEGRAPH_VERSION to src/shared/version.ts and updated both pipeline.ts and finalize.ts to import from the shared module. This eliminates the duplicate package.json reading and the fragile relative path differences.

Comment on lines +135 to +136
.replace(/\/\*[\s\S]*?\*\//g, '')
.replace(/("(?:[^"\\]|\\.)*")|\/\/.*$/gm, (_, str) => str ?? '')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Block comment stripping still runs before string protection

The // stripping is now string-aware (step 2), but the block comment regex on line 135 runs first, before any string values are protected. A tsconfig value like "value": "check /* pattern */ end" would have its content corrupted before the string-aware step 2 can preserve it. This is a pre-existing issue — not introduced by this PR — but since comment handling is being improved here, it's worth noting as a natural follow-up.

The complete fix would interleave string matching into the block comment pass as well:

// Replace both block and line comments while preserving quoted strings
.replace(/("(?:[^"\\]|\\.)*")|\/\*[\s\S]*?\*\/|\/\/.*$/gm, (_, str) => str ?? '')

This single regex handles all three cases — quoted strings (preserve), block comments (remove), and line comments (remove) — in a single pass, with string matching taking priority via left-to-right alternation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 134c829. Merged block comment, line comment, and string preservation into a single regex pass. Left-to-right alternation ensures quoted strings match first and are preserved, while both block and line comments outside strings are stripped in one pass. This closes the pre-existing gap where block comments inside string values would have been corrupted.

carlos-alm added a commit that referenced this pull request Mar 26, 2026
Eliminates duplicate version reading from package.json in pipeline.ts
and finalize.ts. Both now import from a single shared module, removing
the maintenance risk of mismatched relative paths.
carlos-alm added a commit that referenced this pull request Mar 26, 2026
…631)

Merges block comment, line comment, and string preservation into one
regex pass so block comments inside string values are also protected.
…remental rebuilds

The // comment regex in loadPathAliases stripped // inside JSON string
values (e.g. https:// URLs), causing "Bad control character" parse
failures. Fix: match quoted strings first and preserve them, only
stripping // outside strings.

Add codegraph_version check to checkEngineSchemaMismatch so incremental
builds auto-promote to full rebuild when the codegraph version changes,
preventing stale graph data after upgrades.
Eliminates duplicate version reading from package.json in pipeline.ts
and finalize.ts. Both now import from a single shared module, removing
the maintenance risk of mismatched relative paths.
…631)

Merges block comment, line comment, and string preservation into one
regex pass so block comments inside string values are also protected.
@carlos-alm carlos-alm force-pushed the fix/tsconfig-parse-and-version-check branch from 134c829 to d4a6756 Compare March 26, 2026 08:54
@carlos-alm
Copy link
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit d593404 into main Mar 26, 2026
12 checks passed
@carlos-alm carlos-alm deleted the fix/tsconfig-parse-and-version-check branch March 26, 2026 21:36
@github-actions github-actions bot locked and limited conversation to collaborators Mar 26, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant