From ce4650991cbcf126e75c050c81fa414f0606bafb Mon Sep 17 00:00:00 2001
From: "codeflash-ai[bot]"
 <148906541+codeflash-ai[bot]@users.noreply.github.com>
Date: Thu, 26 Feb 2026 00:24:33 +0000
Subject: [PATCH] Optimize discover_functions_from_source
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Runtime improvement: the optimized version reduces end-to-end execution time from ~10.7ms to ~8.24ms (~29% speedup), with the biggest wins on workloads that enumerate many methods (the 1,000-method tests show ~26–34% faster).

What changed (specific optimizations)
- Hoisted the default file Path("unknown.java") out of the per-method allocation and cached it in resolved_file_path = file_path or Path("unknown.java"). That avoids constructing a Path object every time a FunctionToOptimize is created.
- Built the parents list in one expression (conditional single-expression list) instead of creating an empty list and possibly calling .append() per-method.
- Localized frequently accessed method attributes in _should_include_method (name, class_name, return_type) into local variables to reduce repeated attribute lookups inside the hot predicate logic.

Why this speeds things up (mechanics)
- Path() allocation cost: the original code executed file_path or Path("unknown.java") inside each loop iteration when constructing FunctionToOptimize. The profiler shows that line as one of the dominant costs. Moving that work outside the loop removes an allocation and Python-call overhead from each iteration, so the cost reduction scales with number of methods.
- Fewer attribute lookups: accessing method.name and other attributes repeatedly in the tight filter loop triggers repeated attribute descriptor lookups in Python (C overhead). Binding them to local variables (fast loads) reduces that overhead for every conditional, which matters when the loop runs thousands of times.
- Fewer temporaries/operations: replacing a two-step parents creation (list + append) with a single expression reduces bytecode and small allocations per method.

Behavior / dependency changes
- No behavioral change: the filters and returned FunctionToOptimize objects are constructed the same; the code still uses the same analyzer and criteria. No new dependencies were added or removed.
- Minor implementation detail: resolved_file_path is computed once rather than evaluating file_path or Path(...) repeatedly — purely a micro-optimization.

Impact on workloads and hot paths
- This function is in a hot path: discover_functions_from_source is called by code that parses Java files and then extracts contexts (see tests and function_references). For large files or projects (many methods per file), the per-method savings compound, so throughput and latency improve noticeably.
- Best-case scenarios: large-scale processing of many methods per file (the large tests show the biggest relative gains).
- Small inputs: for tiny inputs (zero or one method), the constant overhead of the extra assignment and micro-benchmark noise can make some individual tests appear slightly slower. The profiler and annotated tests show a few micro-test regressions, but these are small absolute changes and are a reasonable trade-off for the large-scale improvements.

Test signal
- Unit tests and regression tests remain functionally equivalent in the provided suite; tests that exercise large numbers of methods show consistent speedups. A handful of very small-case tests report marginally slower times due to fixed per-call overheads — acceptable given the throughput gains on real workloads.

In short: the optimization focuses on reducing per-method CPU and allocation overhead in a hot loop (avoid repeated Path allocations, reduce attribute lookups, and remove small temporaries). Those reductions compound across many methods and produce the observed ~29% runtime improvement.
---
 codeflash/languages/java/discovery.py | 31 ++++++++++++++++++---------
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/codeflash/languages/java/discovery.py b/codeflash/languages/java/discovery.py
index 293120c09..d2d038c5a 100644
--- a/codeflash/languages/java/discovery.py
+++ b/codeflash/languages/java/discovery.py
@@ -79,6 +79,10 @@ def discover_functions_from_source(
             include_static=True,
         )
 
+
+        # Cache the resolved file_path to avoid creating Path("unknown.java") repeatedly
+        resolved_file_path = file_path or Path("unknown.java")
+
         functions: list[FunctionToOptimize] = []
 
         for method in methods:
@@ -87,21 +91,22 @@ def discover_functions_from_source(
                 continue
 
             # Build parents list
-            parents: list[FunctionParent] = []
-            if method.class_name:
-                parents.append(FunctionParent(name=method.class_name, type="ClassDef"))
+            parents: list[FunctionParent] = (
+                [FunctionParent(name=method.class_name, type="ClassDef")] if method.class_name else []
+            )
+
 
             functions.append(
                 FunctionToOptimize(
                     function_name=method.name,
-                    file_path=file_path or Path("unknown.java"),
+                    file_path=resolved_file_path,
                     starting_line=method.start_line,
                     ending_line=method.end_line,
                     starting_col=method.start_col,
                     ending_col=method.end_col,
                     parents=parents,
                     is_async=False,  # Java doesn't have async keyword
-                    is_method=method.class_name is not None,
+                    is_method=(method.class_name is not None),
                     language="java",
                     doc_start_line=method.javadoc_start_line,
                     return_type=method.return_type,
@@ -130,30 +135,36 @@ def _should_include_method(
         True if the method should be included.
 
     """
+    # Skip abstract methods (no implementation to optimize)
+    # Localize frequently used attributes to reduce attribute lookups in the hot path
+    name = method.name
+    class_name = method.class_name
+    return_type = method.return_type
+
     # Skip abstract methods (no implementation to optimize)
     if method.is_abstract:
         return False
 
     # Skip constructors (special case - could be optimized but usually not)
-    if method.name == method.class_name:
+    if name == class_name:
         return False
 
     # Check include patterns
-    if not criteria.matches_include_patterns(method.name):
+    if not criteria.matches_include_patterns(name):
         return False
 
     # Check exclude patterns
-    if criteria.matches_exclude_patterns(method.name):
+    if criteria.matches_exclude_patterns(name):
         return False
 
     # Check require_return - void methods are allowed (verified via test pass/fail),
     # but non-void methods must have an actual return statement
     if criteria.require_return:
-        if method.return_type != "void" and not analyzer.has_return_statement(method, source):
+        if return_type != "void" and not analyzer.has_return_statement(method, source):
             return False
 
     # Check include_methods - in Java, all functions in classes are methods
-    if not criteria.include_methods and method.class_name is not None:
+    if not criteria.include_methods and class_name is not None:
         return False
 
     # Check line count