Skip to content

fix(extraction): handle non-ASCII paths in git fast paths#553

Open
thismilktea wants to merge 1 commit into
colbymchenry:mainfrom
thismilktea:fix/cjk-git-paths
Open

fix(extraction): handle non-ASCII paths in git fast paths#553
thismilktea wants to merge 1 commit into
colbymchenry:mainfrom
thismilktea:fix/cjk-git-paths

Conversation

@thismilktea
Copy link
Copy Markdown
Contributor

Summary

  • switch git-backed file discovery to -z / NUL-delimited parsing
  • switch git-backed change detection to git status --porcelain -z
  • add regression coverage for CJK directory names in both discovery and sync paths

Problem

Files under non-ASCII directory names could be silently skipped during indexing because git fast paths were parsing
human-readable text output rather than machine-readable path output.

When Git returned quoted/escaped non-ASCII paths, CodeGraph could treat that display-format output as a literal file
path and skip the file before it ever reached extraction.

Fix

This PR updates the git-backed scan/change-detection paths in src/extraction/index.ts to use -z machine-readable
output and parse NUL-delimited records instead of newline-delimited text output.

That avoids Git quoted/escaped path output from being treated as a literal path, which restores correct handling for
files under CJK and other non-ASCII directory names.

Test plan

  • npx vitest run __tests__/extraction.test.ts -t "issue #541"
  • npx vitest run __tests__/sync.test.ts -t "CJK directories via git"
  • npm run build

Close #541

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Files under directories with non-ASCII (CJK) names are silently skipped during indexing

1 participant