Skip to content

feat(engine): persist Python definitions into fqn_index (PR-02)#720

Open
shivasurya wants to merge 1 commit into
shiva/rauth-pr-01-index-schemafrom
shiva/rauth-pr-02-python-fqn-extract
Open

feat(engine): persist Python definitions into fqn_index (PR-02)#720
shivasurya wants to merge 1 commit into
shiva/rauth-pr-01-index-schemafrom
shiva/rauth-pr-02-python-fqn-extract

Conversation

@shivasurya

Copy link
Copy Markdown
Owner

Stack: PR-02 of 14 (Rule Authoring Toolchain, Phase 0). Stacked on #719 (PR-01) — review/merge that first; this PR's base is the PR-01 branch so the diff is scoped to PR-02.

What

Populate the persisted FQN index (added empty in PR-01) with every Python function, method, and class the engine discovers during a scan, plus per-file mtime stamps so a later query can detect stale files. The index is a side effect of the same call-graph build, not a second analysis. The in-memory call graph is identical with or without a cache.

Changes

  • BuildCallGraphWithCache (builder.go): new entry point taking an *AnalysisCache; after the existing passes it writes the Python FQN slice. BuildCallGraph stays a nil-cache wrapper so serve/integration/tests are untouched (the PR-01 OpenAnalysisCache pattern).
  • fqn_collect.go: enumerates functions/methods (from callGraph.Functions) and classes (from the code graph) into fqn_index rows with kind, parent FQN, signature, start line. Every entry is source="project": these are definitions in the scanned project's own .py files.
  • fqn_writer.go: ReplacePythonFqnIndex atomically swaps the python slice of fqn_index and stamps indexed_files, in one transaction; LookupFqn reader for PR-04. Go's function_index/pass4 caches are never touched.
  • file_mtime.go: indexed_files table + IsStale/GetIndexedFileMtime, mtime-based per the spec (no checksum).
  • scan/ci: the analysis cache is now opened once before the call-graph build and shared by the Python and Go builders (was Go-only). --index-path/--rebuild-index added to ci for parity with scan.

Decisions (documented)

  • Module-level variables and import aliases are deferred to a Phase 0 follow-up. They depend on internal scope/import-map iteration whose stable contract is not established; emitting uncertain rows would be worse than indexing the callable definitions rules actually match. This is an explicit choice, not a silent gap.
  • No global schema bump. indexed_files is additive (CREATE TABLE IF NOT EXISTS) and fqn_index was empty in PR-01 and is fully repopulated each run, so bumping would only gratuitously cold-start the warm Go cache. Matches the documented per-table policy. A per-table indexed_files_version guards that one table.

Verification

  • go build ./... clean; go test ./... green; golangci-lint run -> 0 issues.
  • Unit tests: collection logic and mtime helpers at 100%; writer ~88% incl. SQL-error injection (dropped tables, closed DB).
  • In-process integration test: parse a real Python project, BuildCallGraphWithCache, assert models.sanitize / models.User / models.User.save / __init__ land in fqn_index, all source=project, and the file is stamped (not stale).
  • End-to-end binary scan output (real .py project):
models.User|class|project|8||models
models.User.__init__|method|project|9|(name: str) -> None|models.User
models.User.save|method|project|12|() -> None|models.User
models.helper|function|project|15||models
models.sanitize|function|project|5|(value: str) -> str|models

Note: the repo lintGo task chains lintPython, red on main from pre-existing black formatting in three python-sdk/ files unrelated to this PR.

🤖 Generated with Claude Code

@shivasurya shivasurya added enhancement New feature or request go Pull requests that update go code python labels May 30, 2026
@shivasurya shivasurya self-assigned this May 30, 2026
@safedep

safedep Bot commented May 30, 2026

Copy link
Copy Markdown

SafeDep Report Summary

Green Malicious Packages Badge Green Vulnerable Packages Badge Green Risky License Badge

No dependency changes detected. Nothing to scan.

View complete scan results →

This report is generated by SafeDep Github App

@code-pathfinder

code-pathfinder Bot commented May 30, 2026

Copy link
Copy Markdown

Pathfinder Report

No security findings on the changed files. This pull request is clean.

View report on the dashboard


Powered by Code Pathfinder.

@codecov

codecov Bot commented May 30, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 84.91620% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.54%. Comparing base (7926bf4) to head (119cca7).

Files with missing lines Patch % Lines
sast-engine/cmd/scan.go 0.00% 18 Missing ⚠️
sast-engine/cmd/ci.go 80.00% 2 Missing and 2 partials ⚠️
sast-engine/graph/callgraph/builder/builder.go 62.50% 1 Missing and 2 partials ⚠️
sast-engine/graph/callgraph/builder/fqn_writer.go 95.74% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@                        Coverage Diff                         @@
##           shiva/rauth-pr-01-index-schema     #720      +/-   ##
==================================================================
+ Coverage                           85.49%   85.54%   +0.05%     
==================================================================
  Files                                 195      198       +3     
  Lines                               27605    27756     +151     
==================================================================
+ Hits                                23600    23745     +145     
- Misses                               3113     3116       +3     
- Partials                              892      895       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@shivasurya shivasurya force-pushed the shiva/rauth-pr-01-index-schema branch 2 times, most recently from db99dae to 7926bf4 Compare May 30, 2026 02:17
Populate the persisted FQN index with every Python function, method, and class
the engine discovers during a scan, plus per-file mtime stamps so a later query
can detect stale files. The index is a side effect of the same call-graph build,
not a second analysis; the in-memory call graph is byte-identical with or
without a cache.

- BuildCallGraphWithCache: new entry point that takes an *AnalysisCache and,
  after the existing passes, writes the Python FQN slice. BuildCallGraph stays
  as a nil-cache wrapper so serve/integration/tests are untouched (same pattern
  as PR-01's OpenAnalysisCache).
- fqn_collect.go: enumerates functions/methods (from callGraph.Functions) and
  classes (from the code graph), producing fqn_index rows with kind, parent FQN,
  signature, start line, and source="project". Every entry is project source:
  these are definitions in the scanned project's own .py files. Module-level
  variables and import aliases are deferred (see below).
- fqn_writer.go: ReplacePythonFqnIndex atomically swaps the python slice of
  fqn_index and stamps indexed_files, all in one transaction; LookupFqn reader
  for PR-04. Go's caches are never touched.
- file_mtime.go: indexed_files table + IsStale/GetIndexedFileMtime for per-file
  staleness, mtime-based per the spec (no checksum).
- scan/ci: the analysis cache is now opened once before the call-graph build and
  shared by the Python and Go builders (was Go-only). Adds --index-path and
  --rebuild-index to ci for parity with scan.

Deferred with reason: module-level variables and import aliases depend on
internal scope/import-map iteration whose stable contract is not established;
emitting uncertain rows would be worse than indexing the callable definitions
rules actually match. Tracked as a Phase 0 follow-up. No global schema bump:
indexed_files is additive and fqn_index was empty in PR-01 and is fully
repopulated each run, so a wipe would only cold-start the warm Go cache.

Verified: unit tests (collection + mtime at 100%, writer ~88% incl. error
injection), an in-process integration test over a parsed Python project, and an
end-to-end binary scan whose fqn_index output shows correct kinds, lines,
parent FQNs, and signatures for functions/methods/classes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@shivasurya shivasurya force-pushed the shiva/rauth-pr-02-python-fqn-extract branch from 0d8a827 to 119cca7 Compare May 30, 2026 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request go Pull requests that update go code python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant