Skip to content

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Feb 2, 2026

⚡️ This pull request contains optimizations for PR #1199

If you approve this dependent PR, these changes will be merged into the original PR branch omni-java.

This PR will be automatically closed if the original PR is merged.


📄 144% (1.44x) speedup for _find_type_node in codeflash/languages/java/context.py

⏱️ Runtime : 169 microseconds 69.3 microseconds (best of 22 runs)

📝 Explanation and details

The optimized code achieves a 143% speedup (from 169μs to 69.3μs) through three key performance improvements:

1. Eliminated Dictionary Reconstruction Overhead (25.3% of original time)
The original code rebuilt the type_declarations dictionary on every recursive call (329 times in the profiler). By hoisting it to a module-level constant TYPE_DECLARATIONS, this overhead is completely eliminated. The profiler shows ~340μs spent creating this dict repeatedly in the original version.

2. Replaced Recursion with Iterative Stack-Based DFS
The original code made 319 recursive calls (_find_type_node(child, type_name, source_bytes) at 638μs). Each recursive call incurs Python function call overhead including frame creation, argument passing, and local variable setup. The optimized version uses an explicit stack to traverse the tree iteratively, eliminating this overhead entirely. This is especially impactful in the deep tree test case, which shows 211% speedup (146μs → 47μs).

3. Direct Byte Comparison Instead of UTF-8 Decoding
The original code decoded byte slices to strings 12 times (source_bytes[...].decode("utf8") at 13μs). The optimized version encodes type_name to bytes once at function entry (10.7μs for 10 calls), then performs direct byte-to-byte comparison without any decoding. This is particularly effective for multibyte UTF-8 names, as shown in the UTF-8 test case with 25.1% speedup (2.24μs → 1.79μs).

Performance Analysis by Test Case:

  • Simple cases show modest improvements (0-2μs) due to lower overhead
  • Nested/deep traversals show dramatic gains (e.g., 211% on 300-depth tree) where recursion elimination matters most
  • UTF-8 handling improves 25% by avoiding repeated decode operations
  • A few edge cases show minor regression (1-6% slower) due to stack manipulation overhead, but these are dwarfed by gains in realistic workloads

The optimization preserves exact behavior including traversal order (reversed children maintain left-to-right DFS), return types, and edge case handling while delivering significant runtime improvements especially for deep syntax trees—a common scenario when parsing Java source code.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 9 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import sys
import types

import pytest  # used for our unit tests
from codeflash.languages.java.context import _find_type_node
# function to test
from tree_sitter import Node  # noqa: E402

class Node:
    """
    Minimal Node implementation that replicates only the attributes and methods
    used by the original _find_type_node function:
      - .type (string)
      - .children (iterable of Node)
      - .child_by_field_name(field_name) -> Node | None
      - .start_byte, .end_byte (ints)
    This is intentionally small and deterministic for unit testing.
    """

    def __init__(self, type: str, start_byte: int = 0, end_byte: int = 0):
        # node type string, e.g., "class_declaration"
        self.type = type
        # byte positions used to slice the provided source_bytes
        self.start_byte = start_byte
        self.end_byte = end_byte
        # list of child Node instances (iteration by the function)
        self.children: list[Node] = []
        # mapping for field name lookups, e.g., {"name": name_node}
        self._fields: dict[str, Node] = {}

    def child_by_field_name(self, name: str):
        # return the Node that was registered for this field name or None
        return self._fields.get(name)

    def set_field(self, name: str, node: "Node"):
        # helper to register a field name and also attach the child to children list
        self._fields[name] = node
        # Ensure child is also present in children collection for traversal
        if node not in self.children:
            self.children.append(node)

def test_basic_find_class_declaration():
    # Basic scenario: a top-level class declaration with a name node that matches.
    source = b"class Hello {}"  # simple source bytes
    # create name node that points to the bytes for "Hello" (bytes[6:11])
    name_node = Node("identifier", start_byte=6, end_byte=11)
    # create class declaration node and attach the name field
    class_node = Node("class_declaration")
    class_node.set_field("name", name_node)

    # root is the class_node itself (no children)
    root = class_node

    # call the function and assert it finds the node and returns "class"
    found_node, kind = _find_type_node(root, "Hello", source) # 2.25μs -> 2.24μs (0.045% faster)

def test_find_interface_and_enum_declarations():
    # Verify function recognizes interface and enum declarations too.
    # Build a root with two children: an interface and an enum.
    source = b"interface I {}\nenum E {}"
    # name byte slices: "I" at bytes 10:11, "E" later at 18:19
    interface_name = Node("identifier", start_byte=10, end_byte=11)
    enum_name = Node("identifier", start_byte=18, end_byte=19)

    interface_node = Node("interface_declaration")
    interface_node.set_field("name", interface_name)

    enum_node = Node("enum_declaration")
    enum_node.set_field("name", enum_name)

    root = Node("program")
    root.children = [interface_node, enum_node]

    found_interface, kind_interface = _find_type_node(root, "I", source) # 2.41μs -> 2.38μs (1.26% faster)

    found_enum, kind_enum = _find_type_node(root, "E", source) # 2.51μs -> 2.56μs (1.99% slower)

def test_nested_declaration_deep_search():
    # Edge: the target declaration is nested multiple levels deep.
    # Ensure recursive traversal finds deeply nested type declarations.
    source_text = "outer { class Inner {} }"
    source = source_text.encode("utf8")
    # Create nested nodes: root -> level1 -> level2 -> class_declaration (Inner)
    inner_name = Node("identifier", start_byte=13, end_byte=18)  # "Inner"
    inner_class = Node("class_declaration")
    inner_class.set_field("name", inner_name)

    level2 = Node("block")
    level2.children = [inner_class]

    level1 = Node("block")
    level1.children = [level2]

    root = Node("program")
    root.children = [level1]

    found, kind = _find_type_node(root, "Inner", source) # 3.26μs -> 2.89μs (12.8% faster)

def test_missing_name_field_does_not_crash_and_returns_none():
    # Edge: a class_declaration node without a name field should be skipped.
    # Create a class_declaration node that has no "name" field, and nothing else matches.
    source = b"class {}"
    unnamed_class = Node("class_declaration")  # no name_field set
    root = Node("program")
    root.children = [unnamed_class]

    found, kind = _find_type_node(root, "Whatever", source) # 1.67μs -> 1.74μs (4.07% slower)

def test_utf8_multibyte_name_matching():
    # Edge: class name contains multibyte UTF-8 characters.
    # The function decodes using utf8; ensure characters like 'é' are handled.
    source_text = "class Café {}"
    source = source_text.encode("utf8")
    # locate "Café" start and end bytes in the encoded source
    start = source.find("Café".encode("utf8"))
    end = start + len("Café".encode("utf8"))
    name_node = Node("identifier", start_byte=start, end_byte=end)
    class_node = Node("class_declaration")
    class_node.set_field("name", name_node)
    root = class_node

    found, kind = _find_type_node(root, "Café", source) # 2.24μs -> 1.79μs (25.1% faster)

def test_traversal_order_first_match_returned():
    # Scenario: two sibling type declarations with the same name; the first one
    # encountered in traversal should be returned.
    # Build siblings: first is a 'class', second is an 'enum' but with the same name.
    source = b"class X {}\nenum X {}"
    # positions for both "X" names
    name1 = Node("identifier", start_byte=6, end_byte=7)   # first X
    name2 = Node("identifier", start_byte=17, end_byte=18)  # second X

    class_node = Node("class_declaration")
    class_node.set_field("name", name1)

    enum_node = Node("enum_declaration")
    enum_node.set_field("name", name2)

    root = Node("program")
    # ensure the class_node is first in children so it's found first
    root.children = [class_node, enum_node]

    found, kind = _find_type_node(root, "X", source) # 2.21μs -> 2.25μs (1.77% slower)

def test_empty_source_bytes_with_name_node_out_of_bounds():
    # Edge: name_node indicating an empty slice in an empty source; decoding yields ""
    # and should not match any non-empty search term.
    source = b""
    # start and end are both zero -> empty decoded name
    name_node = Node("identifier", start_byte=0, end_byte=0)
    class_node = Node("class_declaration")
    class_node.set_field("name", name_node)
    root = class_node

    found, kind = _find_type_node(root, "NonEmpty", source) # 2.07μs -> 2.21μs (6.37% slower)

def test_large_scale_deep_tree_performance_and_correctness():
    # Large-scale scenario: build a deep chain of nodes (but below 1000 depth)
    # and verify that the function can traverse and find the target near the leaf.
    # We'll create a depth of 300 nodes to be thorough but safe.
    depth = 300
    source_prefix = "root "
    # We'll append "Target" at the very end of the synthetic source
    tail = " class Target {}"
    source_text = source_prefix + (" nested" * depth) + tail
    source = source_text.encode("utf8")
    # Build nested chain of generic nodes
    current = Node("block")
    root = current
    # create chain
    for _ in range(depth):
        child = Node("block")
        current.children = [child]
        current = child
    # Attach the final class declaration node at the deepest level
    # Find the byte offsets for "Target" in source
    target_bytes = b"Target"
    start = source.find(target_bytes)
    end = start + len(target_bytes)
    target_name = Node("identifier", start_byte=start, end_byte=end)
    target_class = Node("class_declaration")
    target_class.set_field("name", target_name)
    # attach to the deepest node
    current.children = [target_class]

    found, kind = _find_type_node(root, "Target", source) # 146μs -> 47.0μs (211% faster)

def test_no_match_returns_none_and_empty_kind_for_entire_tree():
    # Ensure that when no type has the requested name anywhere in the tree,
    # the function returns (None, "").
    source = b"class A {} interface B {} enum C {}"
    # create nodes A, B, C with names, none matching "Z"
    a_name = Node("identifier", start_byte=6, end_byte=7)
    a = Node("class_declaration")
    a.set_field("name", a_name)

    b_name = Node("identifier", start_byte=18, end_byte=19)
    b = Node("interface_declaration")
    b.set_field("name", b_name)

    c_name = Node("identifier", start_byte=28, end_byte=29)
    c = Node("enum_declaration")
    c.set_field("name", c_name)

    root = Node("program")
    root.children = [a, b, c]

    found, kind = _find_type_node(root, "Z", source) # 4.10μs -> 4.20μs (2.38% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1199-2026-02-02T00.25.11 and push.

Codeflash

The optimized code achieves a **143% speedup** (from 169μs to 69.3μs) through three key performance improvements:

**1. Eliminated Dictionary Reconstruction Overhead (25.3% of original time)**
The original code rebuilt the `type_declarations` dictionary on every recursive call (329 times in the profiler). By hoisting it to a module-level constant `TYPE_DECLARATIONS`, this overhead is completely eliminated. The profiler shows ~340μs spent creating this dict repeatedly in the original version.

**2. Replaced Recursion with Iterative Stack-Based DFS**
The original code made 319 recursive calls (`_find_type_node(child, type_name, source_bytes)` at 638μs). Each recursive call incurs Python function call overhead including frame creation, argument passing, and local variable setup. The optimized version uses an explicit stack to traverse the tree iteratively, eliminating this overhead entirely. This is especially impactful in the deep tree test case, which shows **211% speedup** (146μs → 47μs).

**3. Direct Byte Comparison Instead of UTF-8 Decoding**
The original code decoded byte slices to strings 12 times (`source_bytes[...].decode("utf8")` at 13μs). The optimized version encodes `type_name` to bytes once at function entry (10.7μs for 10 calls), then performs direct byte-to-byte comparison without any decoding. This is particularly effective for multibyte UTF-8 names, as shown in the UTF-8 test case with **25.1% speedup** (2.24μs → 1.79μs).

**Performance Analysis by Test Case:**
- Simple cases show modest improvements (0-2μs) due to lower overhead
- Nested/deep traversals show dramatic gains (e.g., 211% on 300-depth tree) where recursion elimination matters most
- UTF-8 handling improves 25% by avoiding repeated decode operations
- A few edge cases show minor regression (1-6% slower) due to stack manipulation overhead, but these are dwarfed by gains in realistic workloads

The optimization preserves exact behavior including traversal order (reversed children maintain left-to-right DFS), return types, and edge case handling while delivering significant runtime improvements especially for deep syntax trees—a common scenario when parsing Java source code.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Feb 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants