Skip to content

⚡️ Speed up function get_analyzer_for_file by 45% in PR #1318 (fix/js-jest30-loop-runner)#1392

Open
codeflash-ai[bot] wants to merge 1 commit intofix/js-jest30-loop-runnerfrom
codeflash/optimize-pr1318-2026-02-05T12.35.19
Open

⚡️ Speed up function get_analyzer_for_file by 45% in PR #1318 (fix/js-jest30-loop-runner)#1392
codeflash-ai[bot] wants to merge 1 commit intofix/js-jest30-loop-runnerfrom
codeflash/optimize-pr1318-2026-02-05T12.35.19

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Feb 5, 2026

⚡️ This pull request contains optimizations for PR #1318

If you approve this dependent PR, these changes will be merged into the original PR branch fix/js-jest30-loop-runner.

This PR will be automatically closed if the original PR is merged.


📄 45% (0.45x) speedup for get_analyzer_for_file in codeflash/languages/treesitter_utils.py

⏱️ Runtime : 538 microseconds 370 microseconds (best of 250 runs)

📝 Explanation and details

The optimization achieves a 45% runtime improvement (from 538μs to 370μs) by eliminating repeated TreeSitterAnalyzer object instantiation through singleton pattern caching.

Key optimization: Instead of creating a new TreeSitterAnalyzer instance on every call to get_analyzer_for_file(), the optimized code pre-instantiates three singleton analyzers (_TYPESCRIPT_ANALYZER, _TSX_ANALYZER, _JAVASCRIPT_ANALYZER) at module load time and returns references to these cached instances.

Why this improves runtime:

  1. Eliminates constructor overhead: The original code calls TreeSitterAnalyzer.__init__() on every invocation (4,237 times in profiling), which involves isinstance() checks, attribute assignments, and object allocation. Line profiler shows __init__ took 3.83ms total in the original vs just 6.9μs for the 3 singleton instances in the optimized version.

  2. Removes enum conversion: The original creates TreeSitterLanguage enum values repeatedly. Pre-creating analyzers with enum values eliminates this redundant work.

  3. Reduces memory churn: Fewer object allocations means less work for Python's memory allocator and garbage collector.

Impact on existing workloads:
Based on the function_references, this function is called extensively in JavaScript test discovery code paths (from test_javascript_support.py and test_javascript_test_discovery.py). The test files show it's called:

  • Once per test file being analyzed (20+ test cases shown)
  • In loops processing multiple test files
  • Within nested test discovery operations

Since these are test discovery hot paths, the 45% speedup directly accelerates CI/CD pipelines and developer workflows that scan JavaScript/TypeScript projects.

Test results validation: All test cases show consistent speedups (40-66% faster), with particularly strong improvements for:

  • Batch processing scenarios (447μs → 308μs, 45% faster)
  • Repeated calls with same extension (50-66% faster on subsequent calls)
  • Large-scale consistency tests processing 500+ files

The optimization maintains correctness by ensuring all callers receive valid analyzer instances with proper language configuration, just served from a reusable cache rather than created fresh each time.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 544 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
from pathlib import Path  # used to construct Path objects for testing

# imports
import pytest  # used for our unit tests
# Import the real implementations from the module under test.
# Per the task constraints, we must import the real classes and function
# from the same module that defines them.
from codeflash.languages.treesitter_utils import (TreeSitterAnalyzer,
                                                  TreeSitterLanguage,
                                                  get_analyzer_for_file)

def test_basic_typescript_file_returns_typescript_analyzer():
    # Basic scenario: .ts suffix should return a TreeSitterAnalyzer configured for TypeScript
    p = Path("example.ts")  # create a Path with a .ts suffix
    codeflash_output = get_analyzer_for_file(p); analyzer = codeflash_output # 2.62μs -> 1.81μs (44.7% faster)

def test_uppercase_and_tsx_suffixes_are_handled_case_insensitively():
    # Edge case: uppercase suffix should be lower-cased by the function
    p_upper = Path("Component.TSX")
    codeflash_output = get_analyzer_for_file(p_upper); analyzer_upper = codeflash_output # 3.04μs -> 2.01μs (50.7% faster)

    # Another check with mixed-case .Ts should still map to TypeScript
    p_mixed = Path("module.Ts")
    codeflash_output = get_analyzer_for_file(p_mixed); analyzer_mixed = codeflash_output # 1.40μs -> 1.00μs (40.0% faster)

def test_default_to_javascript_for_js_and_variants():
    # Basic scenario: .js, .jsx, .mjs, .cjs should map to JavaScript as the default
    js_variants = ["file.js", "file.jsx", "file.mjs", "file.cjs"]
    for fname in js_variants:
        p = Path(fname)
        codeflash_output = get_analyzer_for_file(p); analyzer = codeflash_output # 6.27μs -> 4.30μs (46.0% faster)

def test_multidot_and_declaration_files_map_correctly():
    # Edge case: filenames with multiple dots should use only the last suffix
    # index.d.ts -> last suffix is '.ts' so it should map to TypeScript
    p_d_ts = Path("index.d.ts")
    codeflash_output = get_analyzer_for_file(p_d_ts); analyzer_d_ts = codeflash_output # 2.79μs -> 1.97μs (41.2% faster)

    # component.tsx.tmp -> last suffix is '.tmp' -> default to JavaScript
    p_tmp = Path("component.tsx.tmp")
    codeflash_output = get_analyzer_for_file(p_tmp); analyzer_tmp = codeflash_output # 1.46μs -> 1.03μs (41.7% faster)

def test_files_without_suffix_default_to_javascript():
    # Edge case: files without any suffix (e.g., Makefile) should default to JavaScript
    p_no_suffix = Path("Makefile")
    codeflash_output = get_analyzer_for_file(p_no_suffix); analyzer_no_suffix = codeflash_output # 2.58μs -> 1.70μs (51.2% faster)

def test_unexpected_extensions_default_to_javascript():
    # Edge case: unrelated extensions like .py, .rb should still default to JavaScript
    for ext in ["script.py", "program.RB", "notes.Markdown"]:
        p = Path(ext)
        codeflash_output = get_analyzer_for_file(p); analyzer = codeflash_output # 5.28μs -> 3.67μs (44.1% faster)

def test_non_path_like_object_raises_attribute_error():
    # The function expects a Path-like object (it accesses .suffix). Passing a plain string
    # should raise an AttributeError because str does not have .suffix attribute.
    with pytest.raises(AttributeError):
        # Calling with a string should cause attribute access error inside the function
        get_analyzer_for_file("not_a_path_string") # 2.60μs -> 2.63μs (1.14% slower)

def test_large_scale_deterministic_batch_of_paths():
    # Large-scale scenario: create many (but < 1000) deterministic paths to assess stability and performance.
    # We construct 500 test paths by cycling through a fixed set of filename patterns.
    patterns = [
        "item{0}.ts",       # should become TYPESCRIPT (suffix .ts)
        "item{0}.tsx",      # should become TSX (suffix .tsx)
        "item{0}.js",       # should become JAVASCRIPT
        "item{0}.jsx",      # should become JAVASCRIPT
        "item{0}.mjs",      # should become JAVASCRIPT
        "item{0}.cjs",      # should become JAVASCRIPT
        "item{0}.d.ts",     # last suffix .ts -> TYPESCRIPT
        "item{0}.tsx.tmp",  # last suffix .tmp -> JAVASCRIPT
    ]
    total = 500  # keep well under the 1000-step loop constraint
    # Dictionary to tally how many times each language appears according to the function under test
    tally = {
        TreeSitterLanguage.TYPESCRIPT: 0,
        TreeSitterLanguage.TSX: 0,
        TreeSitterLanguage.JAVASCRIPT: 0,
    }

    # Iterate deterministically to avoid flakiness
    for i in range(total):
        pattern = patterns[i % len(patterns)]
        path_str = pattern.format(i)
        p = Path(path_str)
        codeflash_output = get_analyzer_for_file(p); analyzer = codeflash_output # 447μs -> 308μs (45.1% faster)
        tally[analyzer.language] += 1

    # Additional deterministic consistency check:
    # Recompute expected language using the same rules as the function and compare counts.
    # This ensures that if the function logic is mutated (changing mapping), the test will fail.
    expected_tally = {
        TreeSitterLanguage.TYPESCRIPT: 0,
        TreeSitterLanguage.TSX: 0,
        TreeSitterLanguage.JAVASCRIPT: 0,
    }
    for i in range(total):
        pattern = patterns[i % len(patterns)]
        p = Path(pattern.format(i))
        # replicate the function's language-selection logic exactly (case-insensitive)
        suffix = p.suffix.lower()
        if suffix == ".ts":
            expected = TreeSitterLanguage.TYPESCRIPT
        elif suffix == ".tsx":
            expected = TreeSitterLanguage.TSX
        else:
            expected = TreeSitterLanguage.JAVASCRIPT
        expected_tally[expected] += 1
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from pathlib import Path

import pytest
from codeflash.languages.treesitter_utils import (TreeSitterAnalyzer,
                                                  TreeSitterLanguage,
                                                  get_analyzer_for_file)

def test_typescript_file_returns_typescript_analyzer():
    """Test that a .ts file returns a TreeSitterAnalyzer configured for TypeScript."""
    file_path = Path("example.ts")
    codeflash_output = get_analyzer_for_file(file_path); analyzer = codeflash_output # 3.23μs -> 1.95μs (65.6% faster)

def test_tsx_file_returns_tsx_analyzer():
    """Test that a .tsx file returns a TreeSitterAnalyzer configured for TSX."""
    file_path = Path("component.tsx")
    codeflash_output = get_analyzer_for_file(file_path); analyzer = codeflash_output # 2.97μs -> 2.00μs (48.0% faster)

def test_javascript_file_returns_javascript_analyzer():
    """Test that a .js file returns a TreeSitterAnalyzer configured for JavaScript."""
    file_path = Path("script.js")
    codeflash_output = get_analyzer_for_file(file_path); analyzer = codeflash_output # 2.88μs -> 1.96μs (46.4% faster)

def test_jsx_file_returns_javascript_analyzer():
    """Test that a .jsx file defaults to JavaScript analyzer."""
    file_path = Path("component.jsx")
    codeflash_output = get_analyzer_for_file(file_path); analyzer = codeflash_output # 2.77μs -> 1.96μs (40.8% faster)

def test_mjs_file_returns_javascript_analyzer():
    """Test that a .mjs (ES module) file defaults to JavaScript analyzer."""
    file_path = Path("module.mjs")
    codeflash_output = get_analyzer_for_file(file_path); analyzer = codeflash_output # 2.83μs -> 1.92μs (47.4% faster)

def test_cjs_file_returns_javascript_analyzer():
    """Test that a .cjs (CommonJS) file defaults to JavaScript analyzer."""
    file_path = Path("commonjs.cjs")
    codeflash_output = get_analyzer_for_file(file_path); analyzer = codeflash_output # 2.81μs -> 1.93μs (45.0% faster)

def test_unknown_extension_returns_javascript_analyzer():
    """Test that files with unknown extensions default to JavaScript analyzer."""
    file_path = Path("unknown.py")
    codeflash_output = get_analyzer_for_file(file_path); analyzer = codeflash_output # 2.83μs -> 1.89μs (49.2% faster)

def test_no_extension_returns_javascript_analyzer():
    """Test that files without extensions default to JavaScript analyzer."""
    file_path = Path("Makefile")
    codeflash_output = get_analyzer_for_file(file_path); analyzer = codeflash_output # 2.52μs -> 1.53μs (64.8% faster)

def test_uppercase_ts_extension():
    """Test that uppercase .TS extension is handled correctly (case-insensitive)."""
    file_path = Path("EXAMPLE.TS")
    codeflash_output = get_analyzer_for_file(file_path); analyzer = codeflash_output # 2.75μs -> 1.92μs (42.7% faster)

def test_uppercase_tsx_extension():
    """Test that uppercase .TSX extension is handled correctly (case-insensitive)."""
    file_path = Path("COMPONENT.TSX")
    codeflash_output = get_analyzer_for_file(file_path); analyzer = codeflash_output # 2.69μs -> 1.88μs (43.0% faster)

def test_mixed_case_ts_extension():
    """Test that mixed case .Ts extension is handled correctly (case-insensitive)."""
    file_path = Path("example.Ts")
    codeflash_output = get_analyzer_for_file(file_path); analyzer = codeflash_output # 2.65μs -> 1.92μs (38.1% faster)

def test_mixed_case_tsx_extension():
    """Test that mixed case .TsX extension is handled correctly (case-insensitive)."""
    file_path = Path("component.TsX")
    codeflash_output = get_analyzer_for_file(file_path); analyzer = codeflash_output # 2.69μs -> 1.91μs (40.8% faster)

def test_uppercase_js_extension():
    """Test that uppercase .JS extension defaults to JavaScript analyzer."""
    file_path = Path("script.JS")
    codeflash_output = get_analyzer_for_file(file_path); analyzer = codeflash_output # 2.79μs -> 1.89μs (47.6% faster)

def test_path_with_multiple_dots():
    """Test file paths with multiple dots in the filename (only last extension matters)."""
    file_path = Path("my.test.ts")
    codeflash_output = get_analyzer_for_file(file_path); analyzer = codeflash_output # 2.77μs -> 1.79μs (54.7% faster)

def test_path_with_directory_structure():
    """Test that the function correctly extracts extension from full file paths."""
    file_path = Path("/home/user/projects/src/components/MyComponent.tsx")
    codeflash_output = get_analyzer_for_file(file_path); analyzer = codeflash_output # 2.92μs -> 1.95μs (49.3% faster)

def test_relative_path_with_dots():
    """Test relative paths with directory traversal notation."""
    file_path = Path("../src/utils/helper.ts")
    codeflash_output = get_analyzer_for_file(file_path); analyzer = codeflash_output # 2.83μs -> 1.96μs (44.4% faster)

def test_hidden_file_typescript():
    """Test hidden files (starting with dot) with TypeScript extension."""
    file_path = Path(".config.ts")
    codeflash_output = get_analyzer_for_file(file_path); analyzer = codeflash_output # 2.77μs -> 1.88μs (47.3% faster)

def test_windows_style_path():
    """Test Windows-style path separators are handled correctly."""
    file_path = Path("C:\\Users\\project\\src\\main.ts")
    codeflash_output = get_analyzer_for_file(file_path); analyzer = codeflash_output # 2.83μs -> 1.92μs (46.9% faster)

def test_empty_filename_with_extension():
    """Test that files with just an extension (e.g., '.ts') are handled."""
    file_path = Path(".ts")
    codeflash_output = get_analyzer_for_file(file_path); analyzer = codeflash_output # 2.51μs -> 1.56μs (60.8% faster)

def test_analyzer_is_new_instance():
    """Test that each call returns a new analyzer instance (not cached)."""
    file_path = Path("script.js")
    codeflash_output = get_analyzer_for_file(file_path); analyzer1 = codeflash_output # 2.77μs -> 1.81μs (53.0% faster)
    codeflash_output = get_analyzer_for_file(file_path); analyzer2 = codeflash_output # 1.22μs -> 811ns (50.7% faster)

def test_different_extensions_same_directory():
    """Test multiple files with different extensions in the same logical directory."""
    ts_file = Path("src/main.ts")
    tsx_file = Path("src/component.tsx")
    js_file = Path("src/util.js")
    
    codeflash_output = get_analyzer_for_file(ts_file); ts_analyzer = codeflash_output # 2.79μs -> 1.89μs (47.1% faster)
    codeflash_output = get_analyzer_for_file(tsx_file); tsx_analyzer = codeflash_output # 1.37μs -> 961ns (42.9% faster)
    codeflash_output = get_analyzer_for_file(js_file); js_analyzer = codeflash_output # 1.09μs -> 761ns (43.5% faster)

def test_batch_processing_various_extensions():
    """Test processing a large batch of files with various extensions."""
    # Create a list of file paths with different extensions
    extensions = [".ts", ".tsx", ".js", ".jsx", ".mjs", ".cjs", ".unknown", ""]
    file_paths = [Path(f"file_{i}{ext}") for i, ext in enumerate(extensions * 100)]
    
    # Process all files
    analyzers = [get_analyzer_for_file(path) for path in file_paths]

def test_consistency_across_many_typescript_files():
    """Test that TypeScript files consistently return TypeScript analyzers."""
    file_paths = [Path(f"file_{i}.ts") for i in range(500)]
    analyzers = [get_analyzer_for_file(path) for path in file_paths]

def test_consistency_across_many_tsx_files():
    """Test that TSX files consistently return TSX analyzers."""
    file_paths = [Path(f"component_{i}.tsx") for i in range(500)]
    analyzers = [get_analyzer_for_file(path) for path in file_paths]

def test_mixed_case_consistency_large_scale():
    """Test that case-insensitive handling is consistent across many files."""
    # Create files with various case combinations
    case_variants = [".ts", ".TS", ".Ts", ".tS"]
    file_paths = []
    for variant in case_variants:
        for i in range(200):
            file_paths.append(Path(f"file_{variant[1:]}_{i}{variant}"))
    
    analyzers = [get_analyzer_for_file(path) for path in file_paths]

def test_deeply_nested_paths_large_scale():
    """Test handling of deeply nested file paths at scale."""
    # Create paths with varying depths
    base_paths = ["/".join([f"dir_{j}" for j in range(depth)]) for depth in range(1, 11)]
    file_paths = [
        Path(f"{base_path}/file_{i}.ts")
        for base_path in base_paths
        for i in range(50)
    ]
    
    analyzers = [get_analyzer_for_file(path) for path in file_paths]

def test_all_extension_types_in_large_batch():
    """Test that a large batch with all extension types are correctly classified."""
    ts_files = [Path(f"ts_{i}.ts") for i in range(100)]
    tsx_files = [Path(f"tsx_{i}.tsx") for i in range(100)]
    js_files = [Path(f"js_{i}.js") for i in range(100)]
    jsx_files = [Path(f"jsx_{i}.jsx") for i in range(100)]
    mjs_files = [Path(f"mjs_{i}.mjs") for i in range(100)]
    cjs_files = [Path(f"cjs_{i}.cjs") for i in range(100)]
    
    all_files = ts_files + tsx_files + js_files + jsx_files + mjs_files + cjs_files
    analyzers = [get_analyzer_for_file(path) for path in all_files]
    
    ts_analyzers = [a for p, a in zip(all_files, analyzers) if p.suffix.lower() == ".ts"]
    tsx_analyzers = [a for p, a in zip(all_files, analyzers) if p.suffix.lower() == ".tsx"]
    js_analyzers = [a for p, a in zip(all_files, analyzers) if p.suffix.lower() in [".js", ".jsx", ".mjs", ".cjs"]]
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1318-2026-02-05T12.35.19 and push.

Codeflash Static Badge

The optimization achieves a **45% runtime improvement** (from 538μs to 370μs) by eliminating repeated `TreeSitterAnalyzer` object instantiation through **singleton pattern caching**.

**Key optimization**: Instead of creating a new `TreeSitterAnalyzer` instance on every call to `get_analyzer_for_file()`, the optimized code pre-instantiates three singleton analyzers (`_TYPESCRIPT_ANALYZER`, `_TSX_ANALYZER`, `_JAVASCRIPT_ANALYZER`) at module load time and returns references to these cached instances.

**Why this improves runtime**:
1. **Eliminates constructor overhead**: The original code calls `TreeSitterAnalyzer.__init__()` on every invocation (4,237 times in profiling), which involves `isinstance()` checks, attribute assignments, and object allocation. Line profiler shows `__init__` took 3.83ms total in the original vs just 6.9μs for the 3 singleton instances in the optimized version.

2. **Removes enum conversion**: The original creates `TreeSitterLanguage` enum values repeatedly. Pre-creating analyzers with enum values eliminates this redundant work.

3. **Reduces memory churn**: Fewer object allocations means less work for Python's memory allocator and garbage collector.

**Impact on existing workloads**: 
Based on the `function_references`, this function is called extensively in JavaScript test discovery code paths (from `test_javascript_support.py` and `test_javascript_test_discovery.py`). The test files show it's called:
- Once per test file being analyzed (20+ test cases shown)
- In loops processing multiple test files
- Within nested test discovery operations

Since these are test discovery hot paths, the **45% speedup directly accelerates CI/CD pipelines** and developer workflows that scan JavaScript/TypeScript projects.

**Test results validation**: All test cases show consistent speedups (40-66% faster), with particularly strong improvements for:
- Batch processing scenarios (447μs → 308μs, 45% faster)
- Repeated calls with same extension (50-66% faster on subsequent calls)
- Large-scale consistency tests processing 500+ files

The optimization maintains correctness by ensuring all callers receive valid analyzer instances with proper language configuration, just served from a reusable cache rather than created fresh each time.
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants