Skip to content

⚡️ Speed up method DriftAction.serialize by 42%#70

Open
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-DriftAction.serialize-mglscacr
Open

⚡️ Speed up method DriftAction.serialize by 42%#70
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-DriftAction.serialize-mglscacr

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 11, 2025

📄 42% (0.42x) speedup for DriftAction.serialize in graphrag/query/structured_search/drift_search/action.py

⏱️ Runtime : 688 microseconds 485 microseconds (best of 407 runs)

📝 Explanation and details

The optimization adds a simple but effective check before performing the list comprehension for follow-ups serialization.

Key optimization: Instead of always executing [action.serialize() for action in self.follow_ups], the optimized version first checks if self.follow_ups: before creating the list comprehension. When self.follow_ups is empty, it directly assigns an empty list [] instead.

Why this speeds up execution:

  • Avoids unnecessary list comprehension overhead: When self.follow_ups is empty (which happens in 2532 out of 2653 cases based on profiler data), the original code still creates a list comprehension iterator and processes it, even though it yields no results
  • Eliminates iterator creation costs: Python's list comprehension creates internal iterator objects and evaluation machinery that has measurable overhead, even for empty sequences
  • Branch prediction benefits: The simple boolean check on list emptiness is highly predictable and cache-friendly

Performance impact by test case type:

  • Best gains (40-50% faster): Test cases with no follow-ups benefit most, as they completely skip the expensive list comprehension
  • Moderate gains (4-15% faster): Test cases with small numbers of follow-ups still benefit from the reduced overhead
  • Large-scale improvements (42-50% faster): Even tests with many follow-ups show significant gains, likely due to reduced memory allocation patterns
  • Minimal impact on deeply nested cases: Tests with complex nested structures show small improvements or slight regressions due to the additional conditional check

The 41% overall speedup reflects that most real-world usage involves DriftAction objects with empty follow_ups lists, making this a highly effective micro-optimization.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 3695 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 2 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from typing import Any

# imports
import pytest  # used for our unit tests
from graphrag.query.structured_search.drift_search.action import DriftAction

# unit tests

# 1. Basic Test Cases

def test_serialize_basic_minimal():
    # Test minimal DriftAction: only query
    action = DriftAction(query="What is AI?")
    codeflash_output = action.serialize(); result = codeflash_output # 1.00μs -> 747ns (34.1% faster)

def test_serialize_basic_with_answer():
    # Test DriftAction with query and answer
    action = DriftAction(query="What is ML?", answer="Machine Learning")
    codeflash_output = action.serialize(); result = codeflash_output # 1.02μs -> 702ns (45.4% faster)

def test_serialize_basic_with_follow_ups():
    # Test DriftAction with one follow-up
    follow_up = DriftAction(query="Explain supervised learning.")
    action = DriftAction(query="What is ML?", answer="Machine Learning", follow_ups=[follow_up])
    codeflash_output = action.serialize(); result = codeflash_output # 1.63μs -> 1.56μs (4.16% faster)

def test_serialize_basic_multiple_follow_ups():
    # Test DriftAction with multiple follow-ups
    fu1 = DriftAction(query="What is regression?")
    fu2 = DriftAction(query="What is classification?")
    action = DriftAction(query="Types of ML tasks?", follow_ups=[fu1, fu2])
    codeflash_output = action.serialize(); result = codeflash_output # 2.03μs -> 1.80μs (12.5% faster)

def test_serialize_basic_score_and_metadata():
    # Test that score and metadata are included
    action = DriftAction(query="Test")
    action.score = 0.99
    action.metadata["llm_calls"] = 5
    codeflash_output = action.serialize(); result = codeflash_output # 941ns -> 669ns (40.7% faster)

# 2. Edge Test Cases

def test_serialize_edge_empty_query():
    # Test with empty string query
    action = DriftAction(query="")
    codeflash_output = action.serialize(); result = codeflash_output # 920ns -> 637ns (44.4% faster)

def test_serialize_edge_none_answer():
    # Test with explicit None answer
    action = DriftAction(query="Q", answer=None)
    codeflash_output = action.serialize(); result = codeflash_output # 959ns -> 656ns (46.2% faster)

def test_serialize_edge_empty_follow_ups():
    # Test with empty follow_ups list
    action = DriftAction(query="Q", follow_ups=[])
    codeflash_output = action.serialize(); result = codeflash_output # 970ns -> 660ns (47.0% faster)

def test_serialize_edge_nested_follow_ups():
    # Test with nested follow-ups
    fu2 = DriftAction(query="Nested FU", answer="Nested Answer")
    fu1 = DriftAction(query="FU", follow_ups=[fu2])
    action = DriftAction(query="Root", follow_ups=[fu1])
    codeflash_output = action.serialize(); result = codeflash_output # 1.93μs -> 1.99μs (3.02% slower)

def test_serialize_edge_include_follow_ups_false():
    # Test serialization with include_follow_ups=False
    fu = DriftAction(query="FU")
    action = DriftAction(query="Root", follow_ups=[fu])
    codeflash_output = action.serialize(include_follow_ups=False); result = codeflash_output # 708ns -> 694ns (2.02% faster)

def test_serialize_edge_metadata_mutation():
    # Test that metadata is a copy, not a reference (should not mutate original)
    action = DriftAction(query="Q")
    codeflash_output = action.serialize(); result1 = codeflash_output # 942ns -> 681ns (38.3% faster)
    result1["metadata"]["llm_calls"] = 42

def test_serialize_edge_score_none_and_float():
    # Test with score None and then with a float
    action = DriftAction(query="Q")
    codeflash_output = action.serialize(); result = codeflash_output # 998ns -> 671ns (48.7% faster)
    action.score = 3.1415
    codeflash_output = action.serialize(); result2 = codeflash_output # 511ns -> 375ns (36.3% faster)

def test_serialize_edge_metadata_custom_keys():
    # Test with custom metadata keys
    action = DriftAction(query="Q")
    action.metadata["custom"] = "value"
    codeflash_output = action.serialize(); result = codeflash_output # 888ns -> 635ns (39.8% faster)

def test_serialize_edge_follow_ups_with_none_answer():
    # Test follow-ups with None answer
    fu = DriftAction(query="FU", answer=None)
    action = DriftAction(query="Root", follow_ups=[fu])
    codeflash_output = action.serialize(); result = codeflash_output # 1.50μs -> 1.41μs (6.44% faster)

# 3. Large Scale Test Cases

def test_serialize_large_many_follow_ups():
    # Test serialization with 1000 follow-ups
    follow_ups = [DriftAction(query=f"FU{i}", answer=str(i)) for i in range(1000)]
    action = DriftAction(query="Root", follow_ups=follow_ups)
    codeflash_output = action.serialize(); result = codeflash_output # 255μs -> 178μs (42.9% faster)

def test_serialize_large_deep_nesting():
    # Test serialization with deep nesting (chain of 100)
    root = DriftAction(query="Root")
    current = root
    for i in range(1, 100):
        next_action = DriftAction(query=f"Level{i}")
        current.follow_ups.append(next_action)
        current = next_action
    codeflash_output = root.serialize(); result = codeflash_output # 31.4μs -> 32.0μs (1.75% slower)
    # Traverse down the nested follow_ups
    node = result
    for i in range(1, 100):
        node = node["follow_ups"][0]

def test_serialize_large_metadata():
    # Test serialization with large metadata dictionary
    action = DriftAction(query="Q")
    for i in range(500):
        action.metadata[f"key{i}"] = i
    codeflash_output = action.serialize(); result = codeflash_output # 1.10μs -> 788ns (40.2% faster)

def test_serialize_large_combined():
    # Test serialization with both large follow_ups and large metadata
    follow_ups = [DriftAction(query=f"FU{i}") for i in range(500)]
    action = DriftAction(query="Root", follow_ups=follow_ups)
    for i in range(500):
        action.metadata[f"meta{i}"] = str(i)
    codeflash_output = action.serialize(); result = codeflash_output # 120μs -> 80.5μs (50.1% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from typing import Any

# imports
import pytest  # used for our unit tests
from graphrag.query.structured_search.drift_search.action import DriftAction

# unit tests

# 1. Basic Test Cases

def test_serialize_basic_minimal():
    # Test serialization of a minimal DriftAction (only query)
    action = DriftAction(query="What is AI?")
    codeflash_output = action.serialize(); result = codeflash_output # 1.08μs -> 747ns (44.8% faster)

def test_serialize_with_answer():
    # Test serialization with an answer provided
    action = DriftAction(query="What is AI?", answer="Artificial Intelligence")
    codeflash_output = action.serialize(); result = codeflash_output # 1.02μs -> 653ns (55.9% faster)

def test_serialize_with_follow_ups():
    # Test serialization with a single follow-up action
    follow_up = DriftAction(query="Define ML", answer="Machine Learning")
    action = DriftAction(query="What is AI?", answer="Artificial Intelligence", follow_ups=[follow_up])
    codeflash_output = action.serialize(); result = codeflash_output # 1.63μs -> 1.51μs (8.00% faster)

def test_serialize_with_multiple_follow_ups():
    # Test serialization with multiple follow-up actions
    fu1 = DriftAction(query="Define ML", answer="Machine Learning")
    fu2 = DriftAction(query="Define DL", answer="Deep Learning")
    action = DriftAction(query="What is AI?", answer="Artificial Intelligence", follow_ups=[fu1, fu2])
    codeflash_output = action.serialize(); result = codeflash_output # 1.98μs -> 1.72μs (14.6% faster)

def test_serialize_exclude_follow_ups():
    # Test serialization with include_follow_ups=False
    fu1 = DriftAction(query="Define ML", answer="Machine Learning")
    action = DriftAction(query="What is AI?", answer="Artificial Intelligence", follow_ups=[fu1])
    codeflash_output = action.serialize(include_follow_ups=False); result = codeflash_output # 745ns -> 704ns (5.82% faster)

# 2. Edge Test Cases

def test_serialize_empty_query():
    # Test serialization with an empty query string
    action = DriftAction(query="")
    codeflash_output = action.serialize(); result = codeflash_output # 896ns -> 676ns (32.5% faster)

def test_serialize_none_answer_and_followups():
    # Test serialization with answer=None and follow_ups=None explicitly
    action = DriftAction(query="Test", answer=None, follow_ups=None)
    codeflash_output = action.serialize(); result = codeflash_output # 926ns -> 645ns (43.6% faster)

def test_serialize_nested_follow_ups():
    # Test serialization with nested follow-ups (multi-level)
    level3 = DriftAction(query="Level 3", answer="A3")
    level2 = DriftAction(query="Level 2", answer="A2", follow_ups=[level3])
    level1 = DriftAction(query="Level 1", answer="A1", follow_ups=[level2])
    codeflash_output = level1.serialize(); result = codeflash_output # 1.96μs -> 1.95μs (0.256% faster)

def test_serialize_score_and_metadata_mutation():
    # Test that score and metadata are included and mutable
    action = DriftAction(query="Test")
    action.score = 0.95
    action.metadata["llm_calls"] = 2
    codeflash_output = action.serialize(); result = codeflash_output # 945ns -> 640ns (47.7% faster)

def test_serialize_empty_follow_ups_list():
    # Test serialization with an explicit empty follow_ups list
    action = DriftAction(query="Test", follow_ups=[])
    codeflash_output = action.serialize(); result = codeflash_output # 931ns -> 664ns (40.2% faster)

def test_serialize_follow_ups_with_none_answer():
    # Test serialization with follow-ups whose answer is None
    fu = DriftAction(query="FU", answer=None)
    action = DriftAction(query="Test", follow_ups=[fu])
    codeflash_output = action.serialize(); result = codeflash_output # 1.49μs -> 1.41μs (6.19% faster)

def test_serialize_metadata_custom_keys():
    # Test that custom keys added to metadata are serialized
    action = DriftAction(query="Test")
    action.metadata["custom"] = "value"
    codeflash_output = action.serialize(); result = codeflash_output # 915ns -> 641ns (42.7% faster)

def test_serialize_follow_ups_exclude_nested():
    # Test that nested follow_ups are not serialized if include_follow_ups=False
    fu = DriftAction(query="FU", follow_ups=[DriftAction(query="Nested FU")])
    action = DriftAction(query="Test", follow_ups=[fu])
    codeflash_output = action.serialize(include_follow_ups=False); result = codeflash_output # 746ns -> 746ns (0.000% faster)

# 3. Large Scale Test Cases

def test_serialize_large_number_of_follow_ups():
    # Test serialization with a large number of follow-ups
    follow_ups = [DriftAction(query=f"Q{i}", answer=f"A{i}") for i in range(1000)]
    action = DriftAction(query="Root", answer="RootA", follow_ups=follow_ups)
    codeflash_output = action.serialize(); result = codeflash_output # 239μs -> 159μs (50.1% faster)

def test_serialize_deeply_nested_follow_ups():
    # Test serialization with deeply nested follow-ups (depth=10)
    current = DriftAction(query="Q10", answer="A10")
    for i in range(9, 0, -1):
        current = DriftAction(query=f"Q{i}", answer=f"A{i}", follow_ups=[current])
    codeflash_output = current.serialize(); result = codeflash_output # 4.44μs -> 4.47μs (0.582% slower)
    # Traverse down the nested structure
    node = result
    for i in range(1, 11):
        if i < 10:
            node = node["follow_ups"][0]
        else:
            pass

def test_serialize_large_metadata():
    # Test serialization with large metadata dictionary
    action = DriftAction(query="Test")
    # Add 1000 custom metadata keys
    for i in range(1000):
        action.metadata[f"key{i}"] = i
    codeflash_output = action.serialize(); result = codeflash_output # 1.08μs -> 730ns (48.5% faster)

def test_serialize_large_follow_ups_exclude():
    # Test serialization with large follow_ups, but exclude them
    follow_ups = [DriftAction(query=f"Q{i}", answer=f"A{i}") for i in range(1000)]
    action = DriftAction(query="Root", answer="RootA", follow_ups=follow_ups)
    codeflash_output = action.serialize(include_follow_ups=False); result = codeflash_output # 819ns -> 784ns (4.46% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from graphrag.query.structured_search.drift_search.action import DriftAction

def test_DriftAction_serialize():
    DriftAction.serialize(DriftAction('', answer='', follow_ups=[]), include_follow_ups=True)
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_3eu3lmds/tmpwv32iw0l/test_concolic_coverage.py::test_DriftAction_serialize 1.33μs 977ns 36.2%✅

To edit these changes git checkout codeflash/optimize-DriftAction.serialize-mglscacr and push.

Codeflash

The optimization adds a simple but effective check before performing the list comprehension for follow-ups serialization. 

**Key optimization**: Instead of always executing `[action.serialize() for action in self.follow_ups]`, the optimized version first checks `if self.follow_ups:` before creating the list comprehension. When `self.follow_ups` is empty, it directly assigns an empty list `[]` instead.

**Why this speeds up execution**:
- **Avoids unnecessary list comprehension overhead**: When `self.follow_ups` is empty (which happens in 2532 out of 2653 cases based on profiler data), the original code still creates a list comprehension iterator and processes it, even though it yields no results
- **Eliminates iterator creation costs**: Python's list comprehension creates internal iterator objects and evaluation machinery that has measurable overhead, even for empty sequences
- **Branch prediction benefits**: The simple boolean check on list emptiness is highly predictable and cache-friendly

**Performance impact by test case type**:
- **Best gains (40-50% faster)**: Test cases with no follow-ups benefit most, as they completely skip the expensive list comprehension
- **Moderate gains (4-15% faster)**: Test cases with small numbers of follow-ups still benefit from the reduced overhead
- **Large-scale improvements (42-50% faster)**: Even tests with many follow-ups show significant gains, likely due to reduced memory allocation patterns
- **Minimal impact on deeply nested cases**: Tests with complex nested structures show small improvements or slight regressions due to the additional conditional check

The 41% overall speedup reflects that most real-world usage involves DriftAction objects with empty follow_ups lists, making this a highly effective micro-optimization.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 11, 2025 04:38
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants