Skip to content

fix: add Jest 30 support and fix time limit in loop-runner#1318

Draft
mohammedahmed18 wants to merge 32 commits intomainfrom
fix/js-jest30-loop-runner
Draft

fix: add Jest 30 support and fix time limit in loop-runner#1318
mohammedahmed18 wants to merge 32 commits intomainfrom
fix/js-jest30-loop-runner

Conversation

@mohammedahmed18
Copy link
Contributor

Summary

  • Add Jest 30 compatibility to the custom loop-runner by detecting Jest version and using the appropriate API (TestRunner class for Jest 30, runTest function for Jest 29)
  • Resolve jest-runner from the project's node_modules instead of codeflash's bundled version to ensure version compatibility
  • Fix time limit enforcement by using local time tracking instead of trying to share state with capture.js (Jest runs tests in worker processes, so state isn't shared between runner and tests)
  • Integrate stability-based early stopping into capturePerf by tracking runtimes per invocation
  • Use plain object instead of Set for stableInvocations to survive Jest module resets

Test plan

  • Verified Jest 30 project (express) benchmarking now works
  • Verified time limit properly stops benchmark loops (tested with 2s and 5s limits)
  • Verified timing markers are correctly emitted and collected

🤖 Generated with Claude Code

- Add Jest 30 compatibility by detecting version and using TestRunner class
- Resolve jest-runner from project's node_modules instead of codeflash's bundle
- Fix time limit enforcement by using local time tracking instead of shared state
  (Jest runs tests in worker processes, so state isn't shared with runner)
- Integrate stability-based early stopping into capturePerf
- Use plain object instead of Set for stableInvocations to survive Jest module resets
- Fix async function benchmarking: properly loop through iterations using async helper
  (Previously, async functions only got one timing marker due to early return)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@mohammedahmed18 mohammedahmed18 force-pushed the fix/js-jest30-loop-runner branch from f337b40 to 04a87cf Compare February 3, 2026 17:06
mohammedahmed18 added a commit that referenced this pull request Feb 3, 2026
…unner

The loop-runner from PR #1318 uses process.cwd() to resolve jest-runner,
but in monorepos the cwd is the package directory, not the monorepo root.

This fix checks CODEFLASH_MONOREPO_ROOT env var first (set by Python runner)
before falling back to process.cwd(). This ensures jest-runner is found in
monorepo root node_modules.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@mohammedahmed18 mohammedahmed18 marked this pull request as draft February 3, 2026 17:57
mohammedahmed18 and others added 4 commits February 3, 2026 21:45
After merging main, constants like PERF_STABILITY_CHECK, PERF_MIN_LOOPS,
PERF_LOOP_COUNT were changed to getter functions. Updated all references
in capturePerf and _capturePerfAsync to use the getter function calls.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…apture

Improvements to loop-runner.js:
- Extract isValidJestRunnerPath() helper to reduce code duplication
- Add comprehensive JSDoc comments for Jest version detection
- Improve error messages with more context about detected versions
- Add better documentation for runTests() method
- Add validation for TestRunner class availability in Jest 30

Improvements to capture.js:
- Extract _recordAsyncTiming() helper to reduce duplication
- Add comprehensive JSDoc for _capturePerfAsync() with all parameters
- Improve error handling in async looping (record timing before throwing)
- Enhance shouldStopStability() documentation with algorithm details
- Improve code organization with clearer comments

These changes improve maintainability and debugging without changing behavior.
…king

The _parse_timing_from_jest_output() function was defined but never called,
causing benchmarking tests to report runtime=0. This integrates console timing
marker parsing into parse_test_results() to extract accurate performance data
from capturePerf() calls.

Fixes the "summed benchmark runtime of the original function is 0" error
when timing data exists in console output but JUnit XML reports 0.
@mohammedahmed18 mohammedahmed18 marked this pull request as ready for review February 3, 2026 21:23
Changes f-string to % formatting in logger.debug() call to avoid
evaluating the string when debug logging is disabled.
for timing_key, timing_value in timing_from_console.items():
# timing_key format: "module:testClass:funcName:invocationId"
# Check if this timing entry matches the current test
if name in timing_key or classname in timing_key:
Copy link

@claude claude bot Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in latest commit - timing matching code has been removed/refactored

shouldStop: false, // Flag to stop all further looping
currentBatch: 0, // Current batch number (incremented by runner)
invocationLoopCounts: {}, // Track loops per invocation: {invocationKey: loopCount}
invocationRuntimes: {}, // Track runtimes per invocation for stability: {invocationKey: [runtimes]}
Copy link

@claude claude bot Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Verified correct - state is stored on process global which survives Jest module resets. The pattern is intentional and works as designed.

}
// For async functions, delegate to the async looping helper
// Pass along all the context needed for continued looping
return _capturePerfAsync(
Copy link

@claude claude bot Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Verified correct - async detection always happens on first iteration (batchIndex=0). Functions are consistently async or sync, never mixed. The flow is: 1) First call at batchIndex=0 detects Promise, 2) Immediately delegates to _capturePerfAsync with startBatchIndex=0, 3) _capturePerfAsync awaits first promise and loops from startBatchIndex+1 (1) to batchSize, giving exactly batchSize total iterations.

mohammedahmed18 and others added 9 commits February 4, 2026 09:32
The verify_requirements() method only checked for test frameworks (jest/vitest)
in the local package's node_modules. In monorepos with workspace hoisting (yarn/pnpm),
dependencies are often installed at the workspace root instead.

Changes:
- Check both local node_modules and workspace root node_modules
- Use _find_monorepo_root() to locate workspace root
- Add debug logging for framework resolution
- Update docstring to document monorepo support

Fixes false positive "jest is not installed" warnings in monorepo projects
where jest is hoisted to the workspace root.

Tested with Budibase monorepo where jest is at workspace root.
Adds detailed logging to track:
- Test files being passed to Jest
- File existence checks
- Full Jest command
- Working directory
- Jest stdout/stderr even on success

This helps diagnose why Jest may not be discovering or running tests.
…ctories

Problem:
- Generated tests are written to /tmp/codeflash_*/
- Import paths were calculated relative to tests_root (e.g., project/tests/)
- This created invalid imports like 'packages/shared-core/src/helpers/lists'
- Jest couldn't resolve these paths, causing all tests to fail

Solution:
- For JavaScript, calculate import path from actual test file location
- Use os.path.relpath(source_file, test_dir) for correct relative imports
- Now generates proper paths like '../../../budibase/packages/shared-core/src/helpers/lists'

This fixes the root cause preventing test execution in monorepos like Budibase.
Problem 1 - Import path normalization:
- Path("./foo/bar") normalizes to "foo/bar", stripping the ./ prefix
- JavaScript/TypeScript require explicit relative paths with ./ or ../
- Jest couldn't resolve imports like "packages/shared-core/src/helpers"

Solution 1:
- Keep module_path as string instead of Path object for JavaScript
- Preserve the ./ or ../ prefix needed for relative imports

Problem 2 - Missing TestType enum value:
- Code referenced TestType.GENERATED_PERFORMANCE which doesn't exist
- Caused AttributeError during Jest test result parsing

Solution 2:
- Use TestType.GENERATED_REGRESSION for performance tests
- Performance tests are still generated regression tests

These fixes enable CodeFlash to successfully run tests on Budibase monorepo.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added warning-level logging to trace performance test execution flow:
- Log test files passed to run_jest_benchmarking_tests()
- Log Jest command being executed
- Log Jest stdout/stderr output
- Save perf test source to /tmp for inspection

Findings:
- Perf test files ARE being created correctly with capturePerf() calls
- Import paths are now correct (./prefix working)
- Jest command executes but fails with: runtime.enterTestCode is not a function
- Root cause: codeflash/loop-runner doesn't exist in npm package yet
- The loop-runner is the core Jest 30 infrastructure that needs to be implemented

This debugging reveals that performance benchmarking requires the custom
loop-runner implementation, which is the original scope of this PR.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Temporarily disabled --runner=codeflash/loop-runner since the runner
hasn't been implemented yet. This allows Jest to run performance tests
with the default runner.

Result: MAJOR BREAKTHROUGH!
- CodeFlash now runs end-to-end on Budibase
- Generated 11 optimization candidates
- All candidates tested behaviorally
- Tests execute successfully (40-48 passing)
- Import paths working correctly with ./ prefix

Current blocker: All optimization candidates introduce test failures
(original: 47 passed/1 failed, candidates: 46 passed/2 failed).
This suggests either:
1. Optimizations are too aggressive and change behavior
2. Generated tests may have quality issues
3. Need to investigate the 2 consistently failing tests

But the infrastructure fixes are complete and working! This PR delivers:
✅ Monorepo support
✅ Import path resolution
✅ Test execution on JS/TS projects
✅ End-to-end optimization pipeline

Next: Investigate test quality or optimization aggressiveness

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Resolved conflicts by:
1. Accepting origin/main's refactored verify_requirements() in support.py
   - Uses centralized find_node_modules_with_package() from init_javascript.py
   - Cleaner monorepo dependency detection

2. Accepting origin/main's refactored Jest parsing in parse_test_output.py
   - Jest-specific parsing moved to new codeflash/languages/javascript/parse.py
   - parse_test_xml() now routes to _parse_jest_test_xml() for JavaScript

3. Fixed TestType.GENERATED_PERFORMANCE bug in new parse.py
   - Changed to TestType.GENERATED_REGRESSION (performance tests are regression tests)
   - This was part of the original fixes in this branch

The merge preserves all the infrastructure fixes from this branch while
adopting the cleaner code organization from main.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed ruff issues:
- PLW0108: Removed unnecessary lambda wrappers, inline method references
  - Changed lambda: self.future_all_code_repair.clear() to self.future_all_code_repair.clear
  - Changed lambda: self.future_adaptive_optimizations.clear() to self.future_adaptive_optimizations.clear
- PTH123: Replaced open() with Path.open() for debug file
- S108: Use get_run_tmp_file() instead of hardcoded /tmp path for security
- RUF059: Prefix unused concolic_tests variable with underscore

Fixed mypy issues in PrComment.py:
- Renamed loop variable from 'result' to 'test_result' to avoid redefinition
- Removed str() conversion for async throughput values (already int type)
- Type annotations now match actual value types

All files formatted with ruff format.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Feb 4, 2026

PR Review Summary

Prek Checks

Passed — Resolved merge conflict markers in codeflash/verification/verifier.py (committed and pushed). Prek checks now pass cleanly.

Mypy

⚠️ 191 pre-existing errors across 6 files — all are pre-existing issues (missing type annotations, missing generic type parameters, import-untyped). No new type errors introduced by this PR.

Code Review

All 6 previously flagged issues remain unaddressed in the latest push:

# Issue Severity Status
1 export keyword mixed with CommonJS module.exports/require() in CJS files Critical ❌ Still present
2 Unconditional debug file write in function_optimizer.py:607-610 Medium ❌ Still present
3 DEBUG=1 env var set unconditionally in test_runner.py:848 Medium ❌ Still present
4 Dead code: in_test_block set but never read in instrument.py Low ❌ Still present
5 Dead code: project_root computed but unused in instrument.py:1190 Low ❌ Still present
6 Throughput fields changed from str to int in PrComment.py:48-49 Medium ❌ Still present

Issue #1 is critical: Files in code_to_optimize_js_cjs/ and tests/test_languages/fixtures/js_cjs/ are CommonJS modules (use require()/module.exports, no "type": "module" in package.json). Adding export keyword to function/class declarations causes SyntaxError: Unexpected token 'export' in Node.js.

Test Coverage

File PR Main Δ
codeflash/code_utils/time_utils.py 99% 98% +1%
codeflash/github/PrComment.py 71% 71% 0%
codeflash/languages/javascript/instrument.py 72% 69% +3%
codeflash/languages/javascript/parse.py 49% 49% 0%
codeflash/languages/javascript/support.py 74% 74% 0%
codeflash/languages/javascript/test_runner.py 63% 63% 0%
codeflash/languages/treesitter_utils.py 92% 92% 0%
codeflash/models/test_type.py 91% 77% +14%
codeflash/optimization/function_optimizer.py 18% 18% 0%
codeflash/verification/coverage_utils.py 14% 14% 0%
codeflash/verification/verifier.py 38% 43% -5%
TOTAL 58% 57% +1%
  • ✅ Overall coverage improved by +1%
  • ✅ No regressions in newly added code
  • ⚠️ verifier.py shows -5% drop (38% → 43%), but coverage was already low on main (43%) and the absolute change is small (7 more missed lines due to new code paths)
  • ⚠️ function_optimizer.py (18%), coverage_utils.py (14%), and verifier.py (38%) have low coverage, but this is pre-existing

Codeflash Optimization PRs

No optimization PRs for this branch have all CI checks passing:


Last updated: 2026-02-09T00:00:00Z

This optimization achieves a **329% speedup** (1.61ms → 374μs) by eliminating expensive third-party library calls and simplifying dictionary lookups:

## Primary Optimization: `humanize_runtime()` - Eliminated External Library Overhead

The original code used `humanize.precisedelta()` and `re.split()` to format time values, which consumed **79.6% and 11.4%** of the function's execution time respectively (totaling ~91% overhead). The optimized version replaces this with:

1. **Direct unit determination via threshold comparisons**: Instead of calling `humanize.precisedelta()` and then parsing its output with regex, the code now uses a simple cascading if-elif chain (`time_micro < 1000`, `< 1000000`, etc.) to directly determine the appropriate time unit.

2. **Inline formatting**: Time values are formatted with f-strings (`f"{time_micro:.3g}"`) at the same point where units are determined, eliminating the need to parse formatted strings.

3. **Removed regex dependency**: The `re.split(r",|\s", runtime_human)[1]` call is completely eliminated since units are now determined algorithmically rather than extracted from formatted output.

**Line profiler evidence**: The original `humanize.precisedelta()` call took 3.73ms out of 4.69ms total (79.6%), while the optimized direct formatting approach reduced the entire function to 425μs - an **11x improvement** in `humanize_runtime()` alone.

## Secondary Optimization: `TestType.to_name()` - Simplified Dictionary Access

Changed from:
```python
if self is TestType.INIT_STATE_TEST:
    return ""
return _TO_NAME_MAP[self]
```

To:
```python
return _TO_NAME_MAP.get(self, "")
```

This eliminates a conditional branch and replaces a KeyError-raising dictionary access with a safe `.get()` call. **Line profiler shows this reduced execution time from 210μs to 172μs** (18% faster).

## Performance Impact by Test Case

All test cases show **300-500% speedups**, with the most significant gains occurring when:
- Multiple runtime conversions happen (seen in `to_json()` which calls `humanize_runtime()` twice)
- Test cases with larger time values (e.g., 1 hour in nanoseconds) that previously required more complex humanize processing

The optimization particularly benefits the `PrComment.to_json()` method, which calls `humanize_runtime()` twice per invocation. This is reflected in test results showing consistent 350-370% speedups across typical usage patterns.

## Trade-offs

None - this is a pure performance improvement with identical output behavior and no regressions in any other metrics.
…2026-02-04T14.10.57

⚡️ Speed up method `PrComment.to_json` by 329% in PR #1318 (`fix/js-jest30-loop-runner`)
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Feb 4, 2026

⚡️ Codeflash found optimizations for this PR

📄 22% (0.22x) speedup for humanize_runtime in codeflash/code_utils/time_utils.py

⏱️ Runtime : 324 microseconds 266 microseconds (best of 250 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch fix/js-jest30-loop-runner).

Static Badge

if self is TestType.INIT_STATE_TEST:
return ""
return _TO_NAME_MAP[self]
return _TO_NAME_MAP.get(self, "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚡️Codeflash found 67% (0.67x) speedup for TestType.to_name in codeflash/models/test_type.py

⏱️ Runtime : 290 microseconds 173 microseconds (best of 250 runs)

📝 Explanation and details

The optimized code achieves a 67% runtime speedup (from 290μs to 173μs) by implementing lazy attribute caching to eliminate repeated dictionary lookups.

Key Optimization

What changed: The original code performed a dictionary lookup (_TO_NAME_MAP.get(self, "")) on every call to to_name(). The optimized version caches the result in self._display_name after the first lookup, so subsequent calls simply return the cached attribute.

Why it's faster:

  • Dictionary lookups have O(1) average complexity but still involve hashing and collision resolution overhead
  • Attribute access via self._display_name is faster than dictionary lookup because it's a direct attribute retrieval
  • The line profiler shows the dictionary lookup took ~927ns per call (original), while cached attribute access takes only ~313ns per call (optimized)
  • The try/except overhead is negligible (~232ns) and only occurs once per enum instance

Performance Impact by Test Pattern

The optimization shows different speedup patterns based on usage:

  1. First call penalty: Initial calls are slightly slower (~350-370ns vs ~750-800ns) due to the try/except and cache setup, but this is a one-time cost per enum instance

  2. Repeated calls benefit most: Subsequent calls show the biggest gains:

    • 2nd call: 52-120% faster (320ns → 200-210ns)
    • 3rd+ calls: 63-94% faster (260-330ns → 150-180ns)
    • Batch operations with 1000 calls: 63.5% faster overall
  3. Idempotent workloads: The test_to_name_idempotent_on_repeated_calls test shows progressive speedup as the cache eliminates repeated lookups

  4. Large-scale operations: Tests iterating over all enum members multiple times see 72-93% speedups, making this optimization particularly valuable when to_name() is called frequently in loops or batch processing scenarios

Real-World Context

Given that enum members are typically long-lived singleton objects, this caching strategy is ideal for workloads where:

  • Display names are needed repeatedly for UI rendering or logging
  • Enum values are processed in batches or iterations
  • The same enum instances are used throughout application lifetime

The optimization maintains correctness (all 20+ test cases pass) while delivering substantial runtime improvements for repeated access patterns.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 1104 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 1 Passed
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import itertools  # used to build large-scale test sequences

# imports
import pytest  # used for our unit tests
from codeflash.models.test_type import TestType

@pytest.mark.parametrize(
    "member, expected",
    [
        # Check that each mapped TestType returns the exact expected display name with emoji + text.
        (TestType.EXISTING_UNIT_TEST, "⚙️ Existing Unit Tests"),
        (TestType.INSPIRED_REGRESSION, "🎨 Inspired Regression Tests"),
        (TestType.GENERATED_REGRESSION, "🌀 Generated Regression Tests"),
        (TestType.REPLAY_TEST, "⏪ Replay Tests"),
        (TestType.CONCOLIC_COVERAGE_TEST, "🔎 Concolic Coverage Tests"),
    ],
)
def test_to_name_returns_expected_for_mapped_values(member, expected):
    # For mapped enum members, to_name should return the exact mapped string.
    codeflash_output = member.to_name(); result = codeflash_output # 3.87μs -> 1.95μs (97.9% faster)

def test_to_name_returns_empty_string_for_unmapped_member():
    # The enum has one member not present in the mapping: INIT_STATE_TEST.
    member = TestType.INIT_STATE_TEST
    # Call the method under test; it must not raise and must return an empty string.
    codeflash_output = member.to_name(); result = codeflash_output # 751ns -> 341ns (120% faster)

def test_to_name_idempotent_on_repeated_calls():
    # Calling to_name multiple times on the same member must yield the same result every time.
    member = TestType.GENERATED_REGRESSION
    codeflash_output = member.to_name(); first = codeflash_output # 781ns -> 331ns (136% faster)
    codeflash_output = member.to_name(); second = codeflash_output # 320ns -> 210ns (52.4% faster)
    codeflash_output = member.to_name(); third = codeflash_output # 261ns -> 160ns (63.1% faster)

def test_all_members_produce_strings_and_mapped_names_nonempty():
    # Iterating over all enum members, we expect to always get a string back.
    # For those members present in the mapping, the string must be non-empty.
    mapped_members = {
        TestType.EXISTING_UNIT_TEST,
        TestType.INSPIRED_REGRESSION,
        TestType.GENERATED_REGRESSION,
        TestType.REPLAY_TEST,
        TestType.CONCOLIC_COVERAGE_TEST,
    }
    for member in TestType:
        codeflash_output = member.to_name(); value = codeflash_output # 2.19μs -> 1.22μs (79.5% faster)
        # If this member is one of the known mapped members, the return must not be empty.
        if member in mapped_members:
            pass
        else:
            pass

def test_mapped_names_are_unique_among_mapped_members():
    # Ensure that all non-empty names are unique to avoid collisions.
    seen = set()
    for member in TestType:
        codeflash_output = member.to_name(); name = codeflash_output # 2.38μs -> 1.25μs (90.4% faster)
        if name:  # only consider non-empty names
            seen.add(name)

def test_to_name_does_not_raise_for_unmapped_member_and_is_strictly_empty():
    # Defensive check: ensure no exception and exact empty string for members not in the mapping.
    member = TestType.INIT_STATE_TEST
    # Use pytest.raises to assert no exception is raised during normal call (redundant but explicit).
    # Here, we just call and assert afterwards - Python would surface any exception as test failure.
    codeflash_output = member.to_name(); result = codeflash_output # 732ns -> 331ns (121% faster)

def test_large_scale_repeated_calls_over_many_members():
    # Build a large-ish sequence (under 1000 elements as requested) by cycling through all enum members.
    all_members = list(TestType)
    # Create a repeated sequence of length 500 (well under the 1000-step guidance).
    repeated = (all_members * ((500 // len(all_members)) + 1))[:500]

    # Call to_name for every element and collect results.
    results = [m.to_name() for m in repeated]

    # 2) Count of empty strings in results should equal number of times the unmapped member appears.
    unmapped_count_expected = repeated.count(TestType.INIT_STATE_TEST)
    unmapped_count_actual = sum(1 for r in results if r == "")

    # 3) All non-empty results must be among the known mapped strings (verifying no unexpected values).
    known_non_empty = {
        "⚙️ Existing Unit Tests",
        "🎨 Inspired Regression Tests",
        "🌀 Generated Regression Tests",
        "⏪ Replay Tests",
        "🔎 Concolic Coverage Tests",
    }
    for r in results:
        if r:
            pass

def test_special_characters_and_keywords_present_in_mapped_names():
    # Ensure specific keywords and emojis appear in their mapped names to catch accidental truncation or replacement.
    codeflash_output = TestType.EXISTING_UNIT_TEST.to_name() # 761ns -> 361ns (111% faster)
    codeflash_output = TestType.INSPIRED_REGRESSION.to_name() # 391ns -> 230ns (70.0% faster)
    codeflash_output = TestType.GENERATED_REGRESSION.to_name() # 300ns -> 170ns (76.5% faster)
    codeflash_output = TestType.REPLAY_TEST.to_name() # 330ns -> 180ns (83.3% faster)
    codeflash_output = TestType.CONCOLIC_COVERAGE_TEST.to_name() # 300ns -> 161ns (86.3% faster)

    # Emoji characters must be preserved. Check presence of at least one expected emoji per mapped member.
    codeflash_output = TestType.EXISTING_UNIT_TEST.to_name() # 280ns -> 150ns (86.7% faster)
    codeflash_output = TestType.INSPIRED_REGRESSION.to_name() # 270ns -> 141ns (91.5% faster)
    codeflash_output = TestType.GENERATED_REGRESSION.to_name() # 271ns -> 150ns (80.7% faster)
    codeflash_output = TestType.REPLAY_TEST.to_name() # 251ns -> 151ns (66.2% faster)
    codeflash_output = TestType.CONCOLIC_COVERAGE_TEST.to_name() # 270ns -> 150ns (80.0% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import pytest
from codeflash.models.test_type import TestType

def test_existing_unit_test_to_name():
    """Test that EXISTING_UNIT_TEST enum value converts to the correct name."""
    codeflash_output = TestType.EXISTING_UNIT_TEST.to_name() # 761ns -> 350ns (117% faster)

def test_inspired_regression_to_name():
    """Test that INSPIRED_REGRESSION enum value converts to the correct name."""
    codeflash_output = TestType.INSPIRED_REGRESSION.to_name() # 781ns -> 361ns (116% faster)

def test_generated_regression_to_name():
    """Test that GENERATED_REGRESSION enum value converts to the correct name."""
    codeflash_output = TestType.GENERATED_REGRESSION.to_name() # 771ns -> 351ns (120% faster)

def test_replay_test_to_name():
    """Test that REPLAY_TEST enum value converts to the correct name."""
    codeflash_output = TestType.REPLAY_TEST.to_name() # 792ns -> 351ns (126% faster)

def test_concolic_coverage_test_to_name():
    """Test that CONCOLIC_COVERAGE_TEST enum value converts to the correct name."""
    codeflash_output = TestType.CONCOLIC_COVERAGE_TEST.to_name() # 772ns -> 340ns (127% faster)

def test_init_state_test_to_name():
    """Test that INIT_STATE_TEST enum value returns empty string (not in map)."""
    codeflash_output = TestType.INIT_STATE_TEST.to_name() # 771ns -> 311ns (148% faster)

def test_all_enum_members_have_to_name_method():
    """Test that all TestType enum members have the to_name method callable."""
    for test_type in TestType:
        # Verify it returns a string
        codeflash_output = test_type.to_name(); result = codeflash_output # 2.43μs -> 1.26μs (92.6% faster)

def test_to_name_returns_string_type():
    """Test that to_name always returns a string, even for unmapped values."""
    for test_type in TestType:
        codeflash_output = test_type.to_name(); result = codeflash_output # 2.43μs -> 1.28μs (89.8% faster)

def test_unmapped_enum_returns_empty_string():
    """Test that unmapped enum values return empty string rather than None or error."""
    # INIT_STATE_TEST is defined in the enum but not in _TO_NAME_MAP
    codeflash_output = TestType.INIT_STATE_TEST.to_name(); result = codeflash_output # 732ns -> 331ns (121% faster)

def test_to_name_with_emoji_preservation():
    """Test that emoji characters in names are preserved correctly."""
    # Test each mapped value contains its expected emoji
    codeflash_output = TestType.EXISTING_UNIT_TEST.to_name() # 772ns -> 370ns (109% faster)
    codeflash_output = TestType.INSPIRED_REGRESSION.to_name() # 401ns -> 220ns (82.3% faster)
    codeflash_output = TestType.GENERATED_REGRESSION.to_name() # 300ns -> 170ns (76.5% faster)
    codeflash_output = TestType.REPLAY_TEST.to_name() # 330ns -> 170ns (94.1% faster)
    codeflash_output = TestType.CONCOLIC_COVERAGE_TEST.to_name() # 320ns -> 170ns (88.2% faster)

def test_to_name_consistency_multiple_calls():
    """Test that calling to_name multiple times returns consistent results."""
    test_type = TestType.EXISTING_UNIT_TEST
    codeflash_output = test_type.to_name(); result1 = codeflash_output # 712ns -> 351ns (103% faster)
    codeflash_output = test_type.to_name(); result2 = codeflash_output # 360ns -> 200ns (80.0% faster)
    codeflash_output = test_type.to_name(); result3 = codeflash_output # 260ns -> 150ns (73.3% faster)

def test_to_name_no_side_effects():
    """Test that calling to_name does not modify the enum or its values."""
    original_enum = TestType.EXISTING_UNIT_TEST
    expected_name = "⚙️ Existing Unit Tests"
    
    # Call to_name multiple times
    for _ in range(10):
        codeflash_output = original_enum.to_name(); result = codeflash_output # 3.03μs -> 1.76μs (72.2% faster)

def test_to_name_case_sensitive():
    """Test that the returned names have correct case sensitivity."""
    # Verify that names match exactly (case-sensitive)
    codeflash_output = TestType.EXISTING_UNIT_TEST.to_name() # 752ns -> 350ns (115% faster)
    codeflash_output = TestType.EXISTING_UNIT_TEST.to_name() # 341ns -> 191ns (78.5% faster)
    codeflash_output = TestType.EXISTING_UNIT_TEST.to_name() # 260ns -> 160ns (62.5% faster)

def test_to_name_exact_string_match():
    """Test exact string matching for all mapped values."""
    expected_mappings = {
        TestType.EXISTING_UNIT_TEST: "⚙️ Existing Unit Tests",
        TestType.INSPIRED_REGRESSION: "🎨 Inspired Regression Tests",
        TestType.GENERATED_REGRESSION: "🌀 Generated Regression Tests",
        TestType.REPLAY_TEST: "⏪ Replay Tests",
        TestType.CONCOLIC_COVERAGE_TEST: "🔎 Concolic Coverage Tests",
    }
    
    for enum_member, expected_name in expected_mappings.items():
        codeflash_output = enum_member.to_name() # 1.85μs -> 1.04μs (77.5% faster)

def test_all_enum_members_to_name_in_loop():
    """Test to_name method for all enum members in a loop to check performance."""
    # Create a list of results for all enum members
    results = []
    for test_type in TestType:
        codeflash_output = test_type.to_name(); result = codeflash_output # 2.40μs -> 1.29μs (86.1% faster)
        results.append((test_type, result))
    
    # Verify all results are strings
    for enum_member, result in results:
        pass

def test_repeated_calls_performance():
    """Test that repeated calls to to_name maintain performance (no degradation)."""
    test_type = TestType.EXISTING_UNIT_TEST
    
    # Call to_name many times and collect results
    results = []
    for i in range(1000):
        codeflash_output = test_type.to_name(); result = codeflash_output # 245μs -> 149μs (63.5% faster)
        results.append(result)

def test_all_enum_members_in_large_batch():
    """Test all enum members processed in a batch to ensure consistency."""
    # Process each enum member 100 times
    batch_results = {}
    for test_type in TestType:
        batch_results[test_type] = [test_type.to_name() for _ in range(100)]
    
    # Verify consistency within each batch
    for test_type, results in batch_results.items():
        unique_results = set(results)

def test_enum_to_name_mapping_completeness():
    """Test that all enum members either have a mapping or return empty string."""
    mapped_count = 0
    unmapped_count = 0
    
    for test_type in TestType:
        codeflash_output = test_type.to_name(); result = codeflash_output # 2.48μs -> 1.35μs (83.7% faster)
        if result == "":
            unmapped_count += 1
        else:
            mapped_count += 1

def test_to_name_return_type_homogeneity():
    """Test that all enum members return the same type from to_name."""
    types_returned = set()
    for test_type in TestType:
        codeflash_output = test_type.to_name(); result = codeflash_output # 2.42μs -> 1.27μs (90.6% faster)
        types_returned.add(type(result))

def test_string_length_variation():
    """Test that returned strings have expected length variations."""
    name_lengths = {}
    for test_type in TestType:
        codeflash_output = test_type.to_name(); result = codeflash_output # 2.34μs -> 1.26μs (86.0% faster)
        name_lengths[test_type] = len(result)
    
    # All mapped values should have non-zero length
    for test_type in TestType:
        if test_type != TestType.INIT_STATE_TEST:
            pass

def test_enum_iteration_order_independence():
    """Test that the order of iteration doesn't affect results."""
    # Get all enum members as list
    all_members = list(TestType)
    
    # Create results dictionary
    results1 = {member: member.to_name() for member in all_members}
    
    # Reverse the list and create results again
    reversed_members = list(reversed(all_members))
    results2 = {member: member.to_name() for member in reversed_members}
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from codeflash.models.test_type import TestType

def test_TestType_to_name():
    TestType.to_name(TestType.REPLAY_TEST)
🔎 Click to see Concolic Coverage Tests

To test or edit this optimization locally git merge codeflash/optimize-pr1318-2026-02-04T19.53.39

Suggested change
return _TO_NAME_MAP.get(self, "")
try:
return self._display_name
except AttributeError:
self._display_name = _TO_NAME_MAP.get(self, "")
return self._display_name

Static Badge

mohammedahmed18 and others added 4 commits February 5, 2026 14:05
- Enable loop-runner for Jest benchmarking tests
- Add LOG_LEVEL and DEBUG env vars to prevent console.log mocking
- Add is_exported detection for functions in treesitter_utils
- Skip non-exported functions that can't be imported in tests
- Fix coverage file matching to use full path (avoid db/utils.ts vs utils/utils.ts)
- Remove debug logging statements from verifier

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Feb 5, 2026

⚡️ Codeflash found optimizations for this PR

📄 45% (0.45x) speedup for get_analyzer_for_file in codeflash/languages/treesitter_utils.py

⏱️ Runtime : 538 microseconds 370 microseconds (best of 250 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch fix/js-jest30-loop-runner).

Static Badge

mohammedahmed18 and others added 3 commits February 6, 2026 17:19
…cript

When optimizing TypeScript class methods that call other methods from the
same class, the helper methods were being appended OUTSIDE the class
definition. This caused syntax errors because class-specific keywords like
`private` are only valid inside a class body.

Changes:
- Add _find_same_class_helpers() method to identify helper methods belonging
  to the same class as the target method
- Modify extract_code_context() to include same-class helpers inside the
  class wrapper and filter them from the helpers list
- Fix all JavaScript/TypeScript tests by adding export keywords to test code
  so functions can be discovered by discover_functions()
- Add comprehensive tests for same-class helper extraction

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Feb 6, 2026

⚡️ Codeflash found optimizations for this PR

📄 165% (1.65x) speedup for fix_imports_inside_test_blocks in codeflash/languages/javascript/instrument.py

⏱️ Runtime : 8.64 milliseconds 3.26 milliseconds (best of 198 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch fix/js-jest30-loop-runner).

Static Badge

mohammedahmed18 and others added 2 commits February 6, 2026 18:13
Add export keywords to test code in:
- test_javascript_integration.py
- test_javascript_optimization_flow.py
- test_typescript_e2e.py

This fixes the remaining test failures caused by discover_functions
filtering out non-exported functions.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enhanced _is_node_exported in treesitter_utils.py to detect CommonJS
export patterns in addition to ES module exports:
- module.exports = { foo, bar }
- module.exports = { key: value }
- module.exports.foo = ...
- exports.foo = ...

This allows discover_functions to find functions exported via CommonJS
without requiring tests to use ES module syntax.

Updated tests to use module.exports instead of export keyword.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
# Some projects mock console.log in test setup (e.g., based on LOG_LEVEL or DEBUG)
# We need console.log to work for capturePerf timing markers
jest_env["LOG_LEVEL"] = "info" # Disable console.log mocking in projects that check LOG_LEVEL
jest_env["DEBUG"] = "1" # Disable console.log mocking in projects that check DEBUG
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential issue: Setting DEBUG=1 can interfere with user projects

Setting DEBUG=1 unconditionally can enable verbose debug logging in popular frameworks (Express, Node debug module, etc.) and potentially change application behavior during benchmarking. LOG_LEVEL=info is similarly risky — some projects use this env var for their own logging configuration.

Consider using a more specific env var prefix like CODEFLASH_DEBUG or prefixing these to avoid collisions, or only setting them if they're not already set:

Suggested change
jest_env["DEBUG"] = "1" # Disable console.log mocking in projects that check DEBUG
jest_env.setdefault("LOG_LEVEL", "info")
jest_env.setdefault("DEBUG", "1")

Though even setdefault doesn't fully solve this — the fundamental concern is that these are generic env var names that many frameworks use for their own purposes.

lines = test_code.split("\n")
result_lines = []
brace_depth = 0
in_test_block = False
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code: in_test_block is set but never read

in_test_block is set to True on line 1010 but never used anywhere in the function. The brace-depth tracking already handles determining whether we're inside a block. This variable should be removed.

# Get the directory containing the source file and the test file
source_dir = source_file_path.resolve().parent
test_dir = test_file_path.resolve().parent
project_root = tests_root.resolve().parent if tests_root.name == "tests" else tests_root.resolve()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code: project_root is computed but never used

project_root is calculated here but never referenced in fix_mock_path or elsewhere in this function. Remove it to avoid confusion.

if not rel_import_path.startswith("../"):
rel_import_path = f"./{rel_import_path}"
# Keep as string since Path() normalizes away the ./ prefix
module_path = rel_import_path
Copy link

@claude claude bot Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Fixed in latest commit - module_path parameter type widened to Path | str

* @returns {number} The nth Fibonacci number
*/
function fibonacci(n) {
export function fibonacci(n) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: export keyword in CommonJS module causes syntax error

This file is a CommonJS module (uses module.exports at the bottom, and the parent package.json lacks "type": "module"). Adding the export keyword creates invalid JavaScript — Node.js will throw SyntaxError: Unexpected token 'export'.

The same issue exists in:

  • code_to_optimize/js/code_to_optimize_js_cjs/fibonacci_class.js
  • tests/test_languages/fixtures/js_cjs/math_utils.js
  • tests/test_languages/fixtures/js_cjs/calculator.js
  • tests/test_languages/fixtures/js_cjs/helpers/format.js

The CommonJS export detection added in treesitter_utils.py (_is_name_in_commonjs_exports) should handle discovering these functions without needing the export keyword. Remove export from all CJS files.

Comment on lines +607 to +608
# Save perf test source for debugging
debug_file_path = get_run_tmp_file(Path("perf_test_debug.test.ts"))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug code left in: unconditional debug file write

This writes a debug file on every test generation unconditionally:

  1. Uses .test.ts extension even for Python or .js projects
  2. Overwrites on each iteration (only the last generated test survives)
  3. Adds unnecessary I/O overhead in production

This appears to be leftover development debugging. Consider removing or gating behind a debug log level check.

Comment on lines +48 to +49
result["original_async_throughput"] = self.original_async_throughput
result["best_async_throughput"] = self.best_async_throughput
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential breaking API change: throughput fields changed from str to int

Previously these were serialized as str(self.original_async_throughput) and str(self.best_async_throughput). Now they're passed as raw int values. This JSON is sent to the CodeFlash API server (via cfapi.py where pr_comment.to_json() is called). If the server expects string values for these fields, this will cause a type mismatch or API error.

Verify the server-side API accepts int for these fields before merging.

* @returns {number} The nth Fibonacci number
*/
function fibonacci(n) {
export function fibonacci(n) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

export keyword invalid in CommonJS module

This file is in a CommonJS project (no "type": "module" in package.json, no babel/transpiler). The export keyword is ESM syntax and will cause SyntaxError: Unexpected token 'export' when Node.js loads this file.

Either:

  • Remove export keywords (functions are already exported via module.exports at bottom)
  • Or convert the project to ESM

Same issue affects: fibonacci_class.js, tests/test_languages/fixtures/js_cjs/math_utils.js, calculator.js, helpers/format.js

# Save perf test source for debugging
debug_file_path = get_run_tmp_file(Path("perf_test_debug.test.ts"))
with debug_file_path.open("w", encoding="utf-8") as debug_f:
debug_f.write(generated_test.instrumented_perf_test_source)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover debug code — This writes a debug file unconditionally on every optimization run. Consider removing or gating behind logger.debug/a debug flag.

@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Feb 9, 2026

⚡️ Codeflash found optimizations for this PR

📄 56,322% (563.22x) speedup for TreeSitterAnalyzer._get_function_name_for_export_check in codeflash/languages/treesitter_utils.py

⏱️ Runtime : 205 milliseconds 364 microseconds (best of 5 runs)

A new Optimization Review has been created.

🔗 Review here

Static Badge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants