perf: cache CST traversals, add __slots__, use tuple keys by KRRT7 · Pull Request #1881 · codeflash-ai/codeflash

KRRT7 · 2026-03-20T07:01:11Z

Summary

Cache expensive CST traversals in unused_definition_remover (split collect/mark phases) and code_extractor (pre-compute gather_source_imports once per file instead of 3x)
Add slots=True to InvocationId and FunctionTestInvocation frozen dataclasses
Replace string-concat grouping keys with tuples in merge_test_results

Benchmarks

Benchmark	Baseline (median)	Optimized (median)	Speedup
`extract_code_context`	12,196 ms	8,777 ms	~1.39x
`merge_test_results`	71.2 ms	49.8 ms	~1.43x

Test plan

tests/test_code_context_extractor.py — 104 passed
ruff check + ruff format clean
Existing benchmark tests pass

- Split collect_top_level_defs_with_usages into collect/mark phases to avoid redundant CST traversals when processing multiple name sets - Extract gather_source_imports so source module is traversed once per file instead of 3x (RW, RO, TESTGEN) - Add slots=True to InvocationId and FunctionTestInvocation dataclasses - Replace string concat keys with tuples in merge_test_results Benchmarks: extract_code_context ~1.39x faster, merge_test_results ~1.43x faster

claude · 2026-03-20T07:01:44Z

Claude finished @KRRT7's task in 4m 0s —— View job

PR Review Summary

Checklist

Prek Checks

uv run prek run --from-ref HEAD~1 — passed (ruff check + ruff format clean).

Mypy has 60 errors across these files, but all are pre-existing in the codebase, not introduced by this PR. The PR itself adds no new type errors.

Code Review

🚩 Accidental version file changes

codeflash/version.py and codeflash-benchmark/codeflash_benchmark/version.py both contain auto-generated dev versions (0.20.3.post137.dev0+6e40f64a) that appear to be local build artifacts. These should not be committed to the PR — revert them to their pre-PR values. Fix this →

⚠️ Weak type annotation for `gathered_imports` sentinel

code_extractor.py line ~616:

gathered_imports: GatherImportsVisitor | None | object = _SENTINEL,

object subsumes all types, making this annotation effectively object. Consider a typed sentinel:

class _Sentinel:
    pass
_SENTINEL = _Sentinel()
# ...
gathered_imports: GatherImportsVisitor | None | _Sentinel = _SENTINEL,

Similarly, FileContextCache.gathered_imports: Any = None (code_context_extractor.py:70) should be GatherImportsVisitor | None = None for proper typing.

✅ Shallow copy in `mark_defs_for_functions` is safe

marked = {k: UsageInfo(name=v.name, dependencies=v.dependencies) for k, v in base_defs.items()}

UsageInfo.dependencies is a set[str] that QualifiedFunctionUsageMarker only reads (iterates) — never mutates. The only field mutated during marking is used_by_qualified_function (a bool), which is freshly initialized to False in each new UsageInfo. The shallow copy is therefore correct and safe.

✅ Backward compatibility preserved

The _SENTINEL default in add_needed_imports_from_module correctly preserves the existing call signature — callers that don't pass gathered_imports get the old on-the-fly computation path.

✅ Tuple keys in `merge_test_results`

Replacing string-concatenated keys with (module, class, func, loop_index) tuples is cleaner and avoids potential collisions (e.g., if a module path contained :). The loop_index is now stored as int directly rather than str(loop_index), which is slightly more correct.

✅ Correctness of split collect/mark phases

The refactoring in unused_definition_remover.py is logically sound: the expensive MetadataWrapper + DependencyCollector traversal is done once in collect_top_level_defs_with_dependencies, and mark_defs_for_functions is a cheap dict copy + graph walk. The old collect_top_level_defs_with_usages is preserved as a thin wrapper for backward compatibility.

Duplicate Detection

No duplicates detected. All three new functions (collect_top_level_defs_with_dependencies, mark_defs_for_functions, gather_source_imports) are unique in the codebase.

Test Coverage

PR includes test plan showing 104 tests pass in tests/test_code_context_extractor.py. No new public functions lack tests (the refactored functions are exercised through the existing collect_top_level_defs_with_usages wrapper and context extractor tests).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: cache CST traversals, add slots, use tuple keys#1881

perf: cache CST traversals, add slots, use tuple keys#1881
KRRT7 wants to merge 1 commit intomainfrom
autosearch/ds-mar19

KRRT7 commented Mar 20, 2026

Uh oh!

claude bot commented Mar 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KRRT7 commented Mar 20, 2026

Summary

Benchmarks

Test plan

Uh oh!

claude bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Summary

Checklist

Prek Checks

Code Review

🚩 Accidental version file changes

⚠️ Weak type annotation for gathered_imports sentinel

✅ Shallow copy in mark_defs_for_functions is safe

✅ Backward compatibility preserved

✅ Tuple keys in merge_test_results

✅ Correctness of split collect/mark phases

Duplicate Detection

Test Coverage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Mar 20, 2026 •

edited

Loading

⚠️ Weak type annotation for `gathered_imports` sentinel

✅ Shallow copy in `mark_defs_for_functions` is safe

✅ Tuple keys in `merge_test_results`