Skip to content

⚡️ Speed up method AzureAISearchVectorStore.similarity_search_by_vector by 6%#58

Open
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-AzureAISearchVectorStore.similarity_search_by_vector-mglhryt4
Open

⚡️ Speed up method AzureAISearchVectorStore.similarity_search_by_vector by 6%#58
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-AzureAISearchVectorStore.similarity_search_by_vector-mglhryt4

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 10, 2025

📄 6% (0.06x) speedup for AzureAISearchVectorStore.similarity_search_by_vector in graphrag/vector_stores/azure_ai_search.py

⏱️ Runtime : 7.35 milliseconds 6.96 milliseconds (best of 89 runs)

📝 Explanation and details

The optimized code achieves a 5% speedup by eliminating repeated attribute lookups in the list comprehension loop.

Key optimizations:

  1. Local variable caching: The field names (self.id_field, self.text_field, etc.) are cached as local variables before the loop, avoiding repeated self. attribute lookups during iteration.

  2. Constructor reference caching: Function references for VectorStoreDocument, VectorStoreSearchResult, and json.loads are stored in local variables (vdoc_ctor, vsres_ctor, json_loads), eliminating repeated global/module-level lookups.

Why this improves performance:

  • Python's attribute lookup (self.field) and global name resolution are relatively expensive operations when performed repeatedly in tight loops
  • Local variable access is significantly faster than attribute or global lookups in Python's bytecode execution
  • The optimization is most effective when processing many documents, as shown by the larger speedups in large-scale tests (5-8% improvement with 1000 documents vs. smaller gains with few documents)

Test case performance patterns:

  • Small result sets (k=0, k=1): Minimal or slight regression due to setup overhead
  • Medium result sets (k=2-10): Modest improvements (2-4%)
  • Large result sets (k=100-1000): Significant improvements (5-8%)

The optimization maintains identical functionality while reducing the per-document processing overhead in the critical list comprehension loop.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 59 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import json
from abc import ABC
from typing import Any

# imports
import pytest  # used for our unit tests
from graphrag.vector_stores.azure_ai_search import AzureAISearchVectorStore

# Function and dependencies to test
# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License


# Minimal config class for testing
class VectorStoreSchemaConfig:
    def __init__(self, index_name, id_field, text_field, vector_field, attributes_field, vector_size):
        self.index_name = index_name
        self.id_field = id_field
        self.text_field = text_field
        self.vector_field = vector_field
        self.attributes_field = attributes_field
        self.vector_size = vector_size

class VectorStoreDocument:
    def __init__(self, id, text, vector, attributes):
        self.id = id
        self.text = text
        self.vector = vector
        self.attributes = attributes

class VectorStoreSearchResult:
    def __init__(self, document, score):
        self.document = document
        self.score = score

class BaseVectorStore(ABC):
    def __init__(
        self,
        vector_store_schema_config: VectorStoreSchemaConfig,
        db_connection: Any | None = None,
        document_collection: Any | None = None,
        query_filter: Any | None = None,
        **kwargs: Any,
    ):
        self.db_connection = db_connection
        self.document_collection = document_collection
        self.query_filter = query_filter
        self.kwargs = kwargs

        self.index_name = vector_store_schema_config.index_name
        self.id_field = vector_store_schema_config.id_field
        self.text_field = vector_store_schema_config.text_field
        self.vector_field = vector_store_schema_config.vector_field
        self.attributes_field = vector_store_schema_config.attributes_field
        self.vector_size = vector_store_schema_config.vector_size

# Mock VectorizedQuery class for testing
class VectorizedQuery:
    def __init__(self, vector, k_nearest_neighbors, fields):
        self.vector = vector
        self.k_nearest_neighbors = k_nearest_neighbors
        self.fields = fields
from graphrag.vector_stores.azure_ai_search import AzureAISearchVectorStore

# --- Test doubles ---

class DummyDBConnection:
    """A dummy db_connection that returns canned results based on the query."""
    def __init__(self, docs):
        self.docs = docs  # list of dicts representing documents

    def search(self, vector_queries):
        # For testing, just return the first k docs, simulating a search
        # Optionally, simulate a filter by vector_queries[0].k_nearest_neighbors
        k = vector_queries[0].k_nearest_neighbors
        return self.docs[:k]

# --- Fixtures ---

@pytest.fixture
def vector_store_config():
    # Provide a config for the vector store
    return VectorStoreSchemaConfig(
        index_name="test_index",
        id_field="id",
        text_field="text",
        vector_field="vector",
        attributes_field="attributes",
        vector_size=3,
    )

@pytest.fixture
def basic_docs():
    # Provide a list of basic documents for search
    return [
        {
            "id": "doc1",
            "text": "hello world",
            "vector": [1.0, 0.0, 0.0],
            "attributes": json.dumps({"lang": "en"}),
            "@search.score": 0.99,
        },
        {
            "id": "doc2",
            "text": "foo bar",
            "vector": [0.0, 1.0, 0.0],
            "attributes": json.dumps({"lang": "en"}),
            "@search.score": 0.88,
        },
        {
            "id": "doc3",
            "text": "baz qux",
            "vector": [0.0, 0.0, 1.0],
            "attributes": json.dumps({"lang": "en"}),
            "@search.score": 0.77,
        },
    ]

# --- Basic Test Cases ---

def test_basic_search_returns_expected_results(vector_store_config, basic_docs):
    """Test that similarity_search_by_vector returns the correct number and content of results."""
    db_conn = DummyDBConnection(basic_docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    query_vec = [1.0, 0.0, 0.0]
    codeflash_output = store.similarity_search_by_vector(query_vec, k=2); results = codeflash_output # 17.0μs -> 16.5μs (3.08% faster)

def test_k_greater_than_docs(vector_store_config, basic_docs):
    """Test that requesting more neighbors than available docs returns all docs."""
    db_conn = DummyDBConnection(basic_docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    codeflash_output = store.similarity_search_by_vector([0.0, 1.0, 0.0], k=10); results = codeflash_output # 15.7μs -> 15.7μs (0.287% faster)

def test_k_zero_returns_empty(vector_store_config, basic_docs):
    """Test that requesting zero neighbors returns an empty list."""
    db_conn = DummyDBConnection(basic_docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    codeflash_output = store.similarity_search_by_vector([0.0, 1.0, 0.0], k=0); results = codeflash_output # 5.01μs -> 5.39μs (7.07% slower)

def test_empty_db_returns_empty(vector_store_config):
    """Test that searching an empty DB returns an empty list."""
    db_conn = DummyDBConnection([])
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    codeflash_output = store.similarity_search_by_vector([0.0, 1.0, 0.0], k=5); results = codeflash_output # 4.74μs -> 5.49μs (13.7% slower)

def test_attributes_field_is_missing(vector_store_config):
    """Test that missing attributes field returns empty dict."""
    docs = [{
        "id": "doc1",
        "text": "hello world",
        "vector": [1.0, 0.0, 0.0],
        # attributes field missing
        "@search.score": 0.99,
    }]
    db_conn = DummyDBConnection(docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    codeflash_output = store.similarity_search_by_vector([1.0, 0.0, 0.0], k=1); results = codeflash_output # 11.9μs -> 12.1μs (1.86% slower)

def test_id_and_text_fields_missing(vector_store_config):
    """Test that missing id/text fields return empty string."""
    docs = [{
        "vector": [1.0, 0.0, 0.0],
        "attributes": json.dumps({"lang": "en"}),
        "@search.score": 0.99,
    }]
    db_conn = DummyDBConnection(docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    codeflash_output = store.similarity_search_by_vector([1.0, 0.0, 0.0], k=1); results = codeflash_output # 11.4μs -> 11.4μs (0.062% slower)

def test_vector_field_missing(vector_store_config):
    """Test that missing vector field returns empty list."""
    docs = [{
        "id": "doc1",
        "text": "hello world",
        "attributes": json.dumps({"lang": "en"}),
        "@search.score": 0.99,
    }]
    db_conn = DummyDBConnection(docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    codeflash_output = store.similarity_search_by_vector([1.0, 0.0, 0.0], k=1); results = codeflash_output # 10.8μs -> 11.0μs (1.34% slower)


def test_search_score_missing(vector_store_config):
    """Test that missing @search.score raises KeyError."""
    docs = [{
        "id": "doc1",
        "text": "hello world",
        "vector": [1.0, 0.0, 0.0],
        "attributes": json.dumps({"lang": "en"}),
        # "@search.score" missing
    }]
    db_conn = DummyDBConnection(docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    with pytest.raises(KeyError):
        store.similarity_search_by_vector([1.0, 0.0, 0.0], k=1) # 15.3μs -> 15.5μs (1.11% slower)

# --- Edge Test Cases ---

def test_query_vector_empty(vector_store_config, basic_docs):
    """Test that an empty query vector is handled gracefully."""
    db_conn = DummyDBConnection(basic_docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    codeflash_output = store.similarity_search_by_vector([], k=2); results = codeflash_output # 15.4μs -> 14.8μs (4.34% faster)

def test_query_vector_wrong_size(vector_store_config, basic_docs):
    """Test that a query vector of wrong size is accepted (since DummyDBConnection ignores it)."""
    db_conn = DummyDBConnection(basic_docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    # Provide a vector of different size
    codeflash_output = store.similarity_search_by_vector([1.0, 0.0], k=2); results = codeflash_output # 13.8μs -> 13.7μs (1.05% faster)

def test_query_vector_non_numeric(vector_store_config, basic_docs):
    """Test that a query vector with non-numeric values is accepted (since DummyDBConnection ignores it)."""
    db_conn = DummyDBConnection(basic_docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    codeflash_output = store.similarity_search_by_vector(["a", "b", "c"], k=2); results = codeflash_output # 13.8μs -> 13.2μs (4.61% faster)

def test_db_returns_docs_with_extra_fields(vector_store_config):
    """Test that extra fields in docs are ignored."""
    docs = [{
        "id": "doc1",
        "text": "hello world",
        "vector": [1.0, 0.0, 0.0],
        "attributes": json.dumps({"lang": "en"}),
        "@search.score": 0.99,
        "extra_field": "extra_value",
    }]
    db_conn = DummyDBConnection(docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    codeflash_output = store.similarity_search_by_vector([1.0, 0.0, 0.0], k=1); results = codeflash_output # 10.6μs -> 10.4μs (2.13% faster)

def test_db_returns_docs_with_non_list_vector(vector_store_config):
    """Test that a vector field that is not a list is handled as-is."""
    docs = [{
        "id": "doc1",
        "text": "hello world",
        "vector": "not a list",
        "attributes": json.dumps({"lang": "en"}),
        "@search.score": 0.99,
    }]
    db_conn = DummyDBConnection(docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    codeflash_output = store.similarity_search_by_vector([1.0, 0.0, 0.0], k=1); results = codeflash_output # 10.8μs -> 10.5μs (2.72% faster)

def test_db_returns_docs_with_none_vector(vector_store_config):
    """Test that a vector field of None returns [] (default)."""
    docs = [{
        "id": "doc1",
        "text": "hello world",
        "vector": None,
        "attributes": json.dumps({"lang": "en"}),
        "@search.score": 0.99,
    }]
    db_conn = DummyDBConnection(docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    codeflash_output = store.similarity_search_by_vector([1.0, 0.0, 0.0], k=1); results = codeflash_output # 10.5μs -> 10.7μs (1.17% slower)


def test_large_scale_search_returns_expected_count(vector_store_config):
    """Test that similarity_search_by_vector can handle large numbers of docs."""
    num_docs = 1000
    docs = [{
        "id": f"doc{i}",
        "text": f"text {i}",
        "vector": [float(i % 3), float((i+1) % 3), float((i+2) % 3)],
        "attributes": json.dumps({"lang": "en", "idx": i}),
        "@search.score": 1.0 - i / num_docs,
    } for i in range(num_docs)]
    db_conn = DummyDBConnection(docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    codeflash_output = store.similarity_search_by_vector([1.0, 0.0, 0.0], k=1000); results = codeflash_output # 1.79ms -> 1.67ms (6.99% faster)

def test_large_scale_search_returns_top_k(vector_store_config):
    """Test that similarity_search_by_vector returns only top k docs."""
    num_docs = 1000
    docs = [{
        "id": f"doc{i}",
        "text": f"text {i}",
        "vector": [float(i % 3), float((i+1) % 3), float((i+2) % 3)],
        "attributes": json.dumps({"lang": "en", "idx": i}),
        "@search.score": 1.0 - i / num_docs,
    } for i in range(num_docs)]
    db_conn = DummyDBConnection(docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    k = 10
    codeflash_output = store.similarity_search_by_vector([1.0, 0.0, 0.0], k=k); results = codeflash_output # 33.5μs -> 32.7μs (2.36% faster)
    # Should be the first k docs
    for i in range(k):
        pass

def test_large_scale_search_performance(vector_store_config):
    """Test that similarity_search_by_vector completes within reasonable time for large input."""
    import time
    num_docs = 1000
    docs = [{
        "id": f"doc{i}",
        "text": f"text {i}",
        "vector": [float(i % 3), float((i+1) % 3), float((i+2) % 3)],
        "attributes": json.dumps({"lang": "en", "idx": i}),
        "@search.score": 1.0 - i / num_docs,
    } for i in range(num_docs)]
    db_conn = DummyDBConnection(docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    start = time.time()
    codeflash_output = store.similarity_search_by_vector([1.0, 0.0, 0.0], k=num_docs); results = codeflash_output # 1.75ms -> 1.69ms (3.54% faster)
    elapsed = time.time() - start
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import json
from abc import ABC
from typing import Any

# imports
import pytest  # used for our unit tests
from graphrag.vector_stores.azure_ai_search import AzureAISearchVectorStore

# function to test
# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License


class VectorStoreSchemaConfig:
    """Mock config for vector store schema."""
    def __init__(self, index_name, id_field, text_field, vector_field, attributes_field, vector_size):
        self.index_name = index_name
        self.id_field = id_field
        self.text_field = text_field
        self.vector_field = vector_field
        self.attributes_field = attributes_field
        self.vector_size = vector_size

class VectorStoreDocument:
    """Mock document returned by vector store."""
    def __init__(self, id, text, vector, attributes):
        self.id = id
        self.text = text
        self.vector = vector
        self.attributes = attributes

class VectorStoreSearchResult:
    """Mock search result containing a document and its score."""
    def __init__(self, document, score):
        self.document = document
        self.score = score

class BaseVectorStore(ABC):
    """The base class for vector storage data-access classes."""

    def __init__(
        self,
        vector_store_schema_config: VectorStoreSchemaConfig,
        db_connection: Any | None = None,
        document_collection: Any | None = None,
        query_filter: Any | None = None,
        **kwargs: Any,
    ):
        self.db_connection = db_connection
        self.document_collection = document_collection
        self.query_filter = query_filter
        self.kwargs = kwargs

        self.index_name = vector_store_schema_config.index_name
        self.id_field = vector_store_schema_config.id_field
        self.text_field = vector_store_schema_config.text_field
        self.vector_field = vector_store_schema_config.vector_field
        self.attributes_field = vector_store_schema_config.attributes_field
        self.vector_size = vector_store_schema_config.vector_size

class VectorizedQuery:
    """Mock for Azure's VectorizedQuery."""
    def __init__(self, vector, k_nearest_neighbors, fields):
        self.vector = vector
        self.k_nearest_neighbors = k_nearest_neighbors
        self.fields = fields
from graphrag.vector_stores.azure_ai_search import AzureAISearchVectorStore

# --- Unit Tests ---

class MockDBConnection:
    """Mock DB connection for testing."""

    def __init__(self, docs):
        self.docs = docs

    def search(self, vector_queries):
        # For simplicity, just return self.docs (simulate search result)
        # In reality, would use vector_queries to select docs
        k = vector_queries[0].k_nearest_neighbors
        return self.docs[:k]


@pytest.fixture
def vector_store():
    # Set up a vector store with mock config and mock db
    config = VectorStoreSchemaConfig(
        index_name="test_index",
        id_field="id",
        text_field="text",
        vector_field="vector",
        attributes_field="attributes",
        vector_size=3
    )
    docs = [
        {
            "id": "doc1",
            "text": "hello world",
            "vector": [0.1, 0.2, 0.3],
            "attributes": '{"foo": "bar"}',
            "@search.score": 0.99
        },
        {
            "id": "doc2",
            "text": "python code",
            "vector": [0.2, 0.1, 0.4],
            "attributes": '{"baz": 42}',
            "@search.score": 0.88
        },
        {
            "id": "doc3",
            "text": "unit test",
            "vector": [0.3, 0.3, 0.3],
            "attributes": '{}',
            "@search.score": 0.77
        }
    ]
    db_connection = MockDBConnection(docs)
    store = AzureAISearchVectorStore(config, db_connection=db_connection)
    return store

# --- Basic Test Cases ---

def test_basic_search_returns_expected_results(vector_store):
    """Basic: Ensure correct results and mapping."""
    query = [0.1, 0.2, 0.3]
    codeflash_output = vector_store.similarity_search_by_vector(query, k=2); results = codeflash_output # 16.6μs -> 16.7μs (0.699% slower)

def test_basic_search_k_greater_than_docs(vector_store):
    """Basic: k larger than available docs returns all docs."""
    query = [0.1, 0.2, 0.3]
    codeflash_output = vector_store.similarity_search_by_vector(query, k=10); results = codeflash_output # 15.6μs -> 15.5μs (0.413% faster)

def test_basic_search_k_equals_1(vector_store):
    """Basic: k=1 returns only one result."""
    query = [0.1, 0.2, 0.3]
    codeflash_output = vector_store.similarity_search_by_vector(query, k=1); results = codeflash_output # 10.8μs -> 11.0μs (1.87% slower)

def test_basic_search_default_k(vector_store):
    """Basic: default k=10 returns all docs if less than k."""
    query = [0.1, 0.2, 0.3]
    codeflash_output = vector_store.similarity_search_by_vector(query); results = codeflash_output # 15.3μs -> 14.9μs (2.71% faster)

# --- Edge Test Cases ---

def test_edge_empty_document_list():
    """Edge: No documents in DB returns empty list."""
    config = VectorStoreSchemaConfig(
        index_name="test_index",
        id_field="id",
        text_field="text",
        vector_field="vector",
        attributes_field="attributes",
        vector_size=3
    )
    db_connection = MockDBConnection([])
    store = AzureAISearchVectorStore(config, db_connection=db_connection)
    query = [0.1, 0.2, 0.3]
    codeflash_output = store.similarity_search_by_vector(query, k=5); results = codeflash_output # 4.88μs -> 5.22μs (6.55% slower)







def test_edge_document_missing_fields():
    """Edge: Document missing fields uses defaults."""
    config = VectorStoreSchemaConfig(
        index_name="test_index",
        id_field="id",
        text_field="text",
        vector_field="vector",
        attributes_field="attributes",
        vector_size=3
    )
    docs = [
        {
            # missing id, text, vector, attributes
            "@search.score": 0.5
        }
    ]
    db_connection = MockDBConnection(docs)
    store = AzureAISearchVectorStore(config, db_connection=db_connection)
    query = [0.1, 0.2, 0.3]
    codeflash_output = store.similarity_search_by_vector(query, k=1); results = codeflash_output # 15.9μs -> 16.0μs (0.922% slower)

def test_edge_document_attributes_not_json():
    """Edge: Document attributes field not valid JSON uses empty dict."""
    config = VectorStoreSchemaConfig(
        index_name="test_index",
        id_field="id",
        text_field="text",
        vector_field="vector",
        attributes_field="attributes",
        vector_size=3
    )
    docs = [
        {
            "id": "docX",
            "text": "bad json",
            "vector": [0.1, 0.2, 0.3],
            "attributes": "{not valid json!}",
            "@search.score": 0.5
        }
    ]
    db_connection = MockDBConnection(docs)
    store = AzureAISearchVectorStore(config, db_connection=db_connection)
    query = [0.1, 0.2, 0.3]
    # Should raise JSONDecodeError
    with pytest.raises(json.JSONDecodeError):
        store.similarity_search_by_vector(query, k=1) # 15.2μs -> 15.7μs (3.40% slower)

def test_edge_document_score_missing():
    """Edge: Document missing score raises KeyError."""
    config = VectorStoreSchemaConfig(
        index_name="test_index",
        id_field="id",
        text_field="text",
        vector_field="vector",
        attributes_field="attributes",
        vector_size=3
    )
    docs = [
        {
            "id": "docY",
            "text": "no score",
            "vector": [0.1, 0.2, 0.3],
            "attributes": '{}'
        }
    ]
    db_connection = MockDBConnection(docs)
    store = AzureAISearchVectorStore(config, db_connection=db_connection)
    query = [0.1, 0.2, 0.3]
    with pytest.raises(KeyError):
        store.similarity_search_by_vector(query, k=1) # 11.6μs -> 11.9μs (2.59% slower)

# --- Large Scale Test Cases ---

def test_large_scale_many_documents():
    """Large scale: Search with 1000 documents."""
    config = VectorStoreSchemaConfig(
        index_name="test_index",
        id_field="id",
        text_field="text",
        vector_field="vector",
        attributes_field="attributes",
        vector_size=3
    )
    docs = []
    for i in range(1000):
        docs.append({
            "id": f"doc{i}",
            "text": f"text {i}",
            "vector": [float(i % 10) / 10, float((i+1) % 10) / 10, float((i+2) % 10) / 10],
            "attributes": '{"num": %d}' % i,
            "@search.score": 1.0 - (i / 1000)
        })
    db_connection = MockDBConnection(docs)
    store = AzureAISearchVectorStore(config, db_connection=db_connection)
    query = [0.0, 0.1, 0.2]
    codeflash_output = store.similarity_search_by_vector(query, k=1000); results = codeflash_output # 1.66ms -> 1.58ms (5.00% faster)

def test_large_scale_k_less_than_docs():
    """Large scale: k < #docs returns only k results."""
    config = VectorStoreSchemaConfig(
        index_name="test_index",
        id_field="id",
        text_field="text",
        vector_field="vector",
        attributes_field="attributes",
        vector_size=3
    )
    docs = []
    for i in range(500):
        docs.append({
            "id": f"doc{i}",
            "text": f"text {i}",
            "vector": [float(i % 10) / 10, float((i+1) % 10) / 10, float((i+2) % 10) / 10],
            "attributes": '{"num": %d}' % i,
            "@search.score": 1.0 - (i / 500)
        })
    db_connection = MockDBConnection(docs)
    store = AzureAISearchVectorStore(config, db_connection=db_connection)
    query = [0.0, 0.1, 0.2]
    codeflash_output = store.similarity_search_by_vector(query, k=100); results = codeflash_output # 180μs -> 172μs (4.60% faster)


def test_large_scale_performance():
    """Large scale: Performance check for 1000 docs (should run quickly)."""
    import time
    config = VectorStoreSchemaConfig(
        index_name="test_index",
        id_field="id",
        text_field="text",
        vector_field="vector",
        attributes_field="attributes",
        vector_size=3
    )
    docs = []
    for i in range(1000):
        docs.append({
            "id": f"doc{i}",
            "text": f"text {i}",
            "vector": [float(i % 10) / 10, float((i+1) % 10) / 10, float((i+2) % 10) / 10],
            "attributes": '{"num": %d}' % i,
            "@search.score": 1.0 - (i / 1000)
        })
    db_connection = MockDBConnection(docs)
    store = AzureAISearchVectorStore(config, db_connection=db_connection)
    query = [0.1, 0.2, 0.3]
    start = time.time()
    codeflash_output = store.similarity_search_by_vector(query, k=1000); results = codeflash_output # 1.67ms -> 1.55ms (7.80% faster)
    end = time.time()
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from graphrag.vector_stores.azure_ai_search import AzureAISearchVectorStore

To edit these changes git checkout codeflash/optimize-AzureAISearchVectorStore.similarity_search_by_vector-mglhryt4 and push.

Codeflash

The optimized code achieves a 5% speedup by eliminating repeated attribute lookups in the list comprehension loop. 

**Key optimizations:**

1. **Local variable caching**: The field names (`self.id_field`, `self.text_field`, etc.) are cached as local variables before the loop, avoiding repeated `self.` attribute lookups during iteration.

2. **Constructor reference caching**: Function references for `VectorStoreDocument`, `VectorStoreSearchResult`, and `json.loads` are stored in local variables (`vdoc_ctor`, `vsres_ctor`, `json_loads`), eliminating repeated global/module-level lookups.

**Why this improves performance:**
- Python's attribute lookup (`self.field`) and global name resolution are relatively expensive operations when performed repeatedly in tight loops
- Local variable access is significantly faster than attribute or global lookups in Python's bytecode execution
- The optimization is most effective when processing many documents, as shown by the larger speedups in large-scale tests (5-8% improvement with 1000 documents vs. smaller gains with few documents)

**Test case performance patterns:**
- Small result sets (k=0, k=1): Minimal or slight regression due to setup overhead
- Medium result sets (k=2-10): Modest improvements (2-4%)
- Large result sets (k=100-1000): Significant improvements (5-8%)

The optimization maintains identical functionality while reducing the per-document processing overhead in the critical list comprehension loop.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 10, 2025 23:43
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants