Skip to content

perf: cache CST traversals, add __slots__, use tuple keys#1881

Open
KRRT7 wants to merge 1 commit intomainfrom
autosearch/ds-mar19
Open

perf: cache CST traversals, add __slots__, use tuple keys#1881
KRRT7 wants to merge 1 commit intomainfrom
autosearch/ds-mar19

Conversation

@KRRT7
Copy link
Collaborator

@KRRT7 KRRT7 commented Mar 20, 2026

Summary

  • Cache expensive CST traversals in unused_definition_remover (split collect/mark phases) and code_extractor (pre-compute gather_source_imports once per file instead of 3x)
  • Add slots=True to InvocationId and FunctionTestInvocation frozen dataclasses
  • Replace string-concat grouping keys with tuples in merge_test_results

Benchmarks

Benchmark Baseline (median) Optimized (median) Speedup
extract_code_context 12,196 ms 8,777 ms ~1.39x
merge_test_results 71.2 ms 49.8 ms ~1.43x

Test plan

  • tests/test_code_context_extractor.py — 104 passed
  • ruff check + ruff format clean
  • Existing benchmark tests pass

- Split collect_top_level_defs_with_usages into collect/mark phases to
  avoid redundant CST traversals when processing multiple name sets
- Extract gather_source_imports so source module is traversed once per
  file instead of 3x (RW, RO, TESTGEN)
- Add slots=True to InvocationId and FunctionTestInvocation dataclasses
- Replace string concat keys with tuples in merge_test_results

Benchmarks: extract_code_context ~1.39x faster, merge_test_results ~1.43x faster
@claude
Copy link
Contributor

claude bot commented Mar 20, 2026

Claude finished @KRRT7's task in 4m 0s —— View job


PR Review Summary

Checklist

  • Triage — LARGE (193 lines changed, 5 production files)
  • Lint and typecheck
  • Resolve stale threads — none found
  • Review diff
  • Duplicate detection
  • Summary comment

Prek Checks

uv run prek run --from-ref HEAD~1passed (ruff check + ruff format clean).

Mypy has 60 errors across these files, but all are pre-existing in the codebase, not introduced by this PR. The PR itself adds no new type errors.


Code Review

🚩 Accidental version file changes

codeflash/version.py and codeflash-benchmark/codeflash_benchmark/version.py both contain auto-generated dev versions (0.20.3.post137.dev0+6e40f64a) that appear to be local build artifacts. These should not be committed to the PR — revert them to their pre-PR values. Fix this →

⚠️ Weak type annotation for gathered_imports sentinel

code_extractor.py line ~616:

gathered_imports: GatherImportsVisitor | None | object = _SENTINEL,

object subsumes all types, making this annotation effectively object. Consider a typed sentinel:

class _Sentinel:
    pass
_SENTINEL = _Sentinel()
# ...
gathered_imports: GatherImportsVisitor | None | _Sentinel = _SENTINEL,

Similarly, FileContextCache.gathered_imports: Any = None (code_context_extractor.py:70) should be GatherImportsVisitor | None = None for proper typing.

✅ Shallow copy in mark_defs_for_functions is safe

marked = {k: UsageInfo(name=v.name, dependencies=v.dependencies) for k, v in base_defs.items()}

UsageInfo.dependencies is a set[str] that QualifiedFunctionUsageMarker only reads (iterates) — never mutates. The only field mutated during marking is used_by_qualified_function (a bool), which is freshly initialized to False in each new UsageInfo. The shallow copy is therefore correct and safe.

✅ Backward compatibility preserved

The _SENTINEL default in add_needed_imports_from_module correctly preserves the existing call signature — callers that don't pass gathered_imports get the old on-the-fly computation path.

✅ Tuple keys in merge_test_results

Replacing string-concatenated keys with (module, class, func, loop_index) tuples is cleaner and avoids potential collisions (e.g., if a module path contained :). The loop_index is now stored as int directly rather than str(loop_index), which is slightly more correct.

✅ Correctness of split collect/mark phases

The refactoring in unused_definition_remover.py is logically sound: the expensive MetadataWrapper + DependencyCollector traversal is done once in collect_top_level_defs_with_dependencies, and mark_defs_for_functions is a cheap dict copy + graph walk. The old collect_top_level_defs_with_usages is preserved as a thin wrapper for backward compatibility.


Duplicate Detection

No duplicates detected. All three new functions (collect_top_level_defs_with_dependencies, mark_defs_for_functions, gather_source_imports) are unique in the codebase.


Test Coverage

PR includes test plan showing 104 tests pass in tests/test_code_context_extractor.py. No new public functions lack tests (the refactored functions are exercised through the existing collect_top_level_defs_with_usages wrapper and context extractor tests).


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant