⚡️ Speed up method `AzureAISearchVectorStore.similarity_search_by_vector` by 6% by codeflash-ai[bot] · Pull Request #58 · codeflash-ai/graphrag

codeflash-ai · 2025-10-10T23:43:00Z

📄 6% (0.06x) speedup for `AzureAISearchVectorStore.similarity_search_by_vector` in `graphrag/vector_stores/azure_ai_search.py`

⏱️ Runtime : 7.35 milliseconds → 6.96 milliseconds (best of 89 runs)

📝 Explanation and details

The optimized code achieves a 5% speedup by eliminating repeated attribute lookups in the list comprehension loop.

Key optimizations:

Local variable caching: The field names (self.id_field, self.text_field, etc.) are cached as local variables before the loop, avoiding repeated self. attribute lookups during iteration.
Constructor reference caching: Function references for VectorStoreDocument, VectorStoreSearchResult, and json.loads are stored in local variables (vdoc_ctor, vsres_ctor, json_loads), eliminating repeated global/module-level lookups.

Why this improves performance:

Python's attribute lookup (self.field) and global name resolution are relatively expensive operations when performed repeatedly in tight loops
Local variable access is significantly faster than attribute or global lookups in Python's bytecode execution
The optimization is most effective when processing many documents, as shown by the larger speedups in large-scale tests (5-8% improvement with 1000 documents vs. smaller gains with few documents)

Test case performance patterns:

Small result sets (k=0, k=1): Minimal or slight regression due to setup overhead
Medium result sets (k=2-10): Modest improvements (2-4%)
Large result sets (k=100-1000): Significant improvements (5-8%)

The optimization maintains identical functionality while reducing the per-document processing overhead in the critical list comprehension loop.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 59 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

import json
from abc import ABC
from typing import Any

# imports
import pytest  # used for our unit tests
from graphrag.vector_stores.azure_ai_search import AzureAISearchVectorStore

# Function and dependencies to test
# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License


# Minimal config class for testing
class VectorStoreSchemaConfig:
    def __init__(self, index_name, id_field, text_field, vector_field, attributes_field, vector_size):
        self.index_name = index_name
        self.id_field = id_field
        self.text_field = text_field
        self.vector_field = vector_field
        self.attributes_field = attributes_field
        self.vector_size = vector_size

class VectorStoreDocument:
    def __init__(self, id, text, vector, attributes):
        self.id = id
        self.text = text
        self.vector = vector
        self.attributes = attributes

class VectorStoreSearchResult:
    def __init__(self, document, score):
        self.document = document
        self.score = score

class BaseVectorStore(ABC):
    def __init__(
        self,
        vector_store_schema_config: VectorStoreSchemaConfig,
        db_connection: Any | None = None,
        document_collection: Any | None = None,
        query_filter: Any | None = None,
        **kwargs: Any,
    ):
        self.db_connection = db_connection
        self.document_collection = document_collection
        self.query_filter = query_filter
        self.kwargs = kwargs

        self.index_name = vector_store_schema_config.index_name
        self.id_field = vector_store_schema_config.id_field
        self.text_field = vector_store_schema_config.text_field
        self.vector_field = vector_store_schema_config.vector_field
        self.attributes_field = vector_store_schema_config.attributes_field
        self.vector_size = vector_store_schema_config.vector_size

# Mock VectorizedQuery class for testing
class VectorizedQuery:
    def __init__(self, vector, k_nearest_neighbors, fields):
        self.vector = vector
        self.k_nearest_neighbors = k_nearest_neighbors
        self.fields = fields
from graphrag.vector_stores.azure_ai_search import AzureAISearchVectorStore

# --- Test doubles ---

class DummyDBConnection:
    """A dummy db_connection that returns canned results based on the query."""
    def __init__(self, docs):
        self.docs = docs  # list of dicts representing documents

    def search(self, vector_queries):
        # For testing, just return the first k docs, simulating a search
        # Optionally, simulate a filter by vector_queries[0].k_nearest_neighbors
        k = vector_queries[0].k_nearest_neighbors
        return self.docs[:k]

# --- Fixtures ---

@pytest.fixture
def vector_store_config():
    # Provide a config for the vector store
    return VectorStoreSchemaConfig(
        index_name="test_index",
        id_field="id",
        text_field="text",
        vector_field="vector",
        attributes_field="attributes",
        vector_size=3,
    )

@pytest.fixture
def basic_docs():
    # Provide a list of basic documents for search
    return [
        {
            "id": "doc1",
            "text": "hello world",
            "vector": [1.0, 0.0, 0.0],
            "attributes": json.dumps({"lang": "en"}),
            "@search.score": 0.99,
        },
        {
            "id": "doc2",
            "text": "foo bar",
            "vector": [0.0, 1.0, 0.0],
            "attributes": json.dumps({"lang": "en"}),
            "@search.score": 0.88,
        },
        {
            "id": "doc3",
            "text": "baz qux",
            "vector": [0.0, 0.0, 1.0],
            "attributes": json.dumps({"lang": "en"}),
            "@search.score": 0.77,
        },
    ]

# --- Basic Test Cases ---

def test_basic_search_returns_expected_results(vector_store_config, basic_docs):
    """Test that similarity_search_by_vector returns the correct number and content of results."""
    db_conn = DummyDBConnection(basic_docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    query_vec = [1.0, 0.0, 0.0]
    codeflash_output = store.similarity_search_by_vector(query_vec, k=2); results = codeflash_output # 17.0μs -> 16.5μs (3.08% faster)

def test_k_greater_than_docs(vector_store_config, basic_docs):
    """Test that requesting more neighbors than available docs returns all docs."""
    db_conn = DummyDBConnection(basic_docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    codeflash_output = store.similarity_search_by_vector([0.0, 1.0, 0.0], k=10); results = codeflash_output # 15.7μs -> 15.7μs (0.287% faster)

def test_k_zero_returns_empty(vector_store_config, basic_docs):
    """Test that requesting zero neighbors returns an empty list."""
    db_conn = DummyDBConnection(basic_docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    codeflash_output = store.similarity_search_by_vector([0.0, 1.0, 0.0], k=0); results = codeflash_output # 5.01μs -> 5.39μs (7.07% slower)

def test_empty_db_returns_empty(vector_store_config):
    """Test that searching an empty DB returns an empty list."""
    db_conn = DummyDBConnection([])
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    codeflash_output = store.similarity_search_by_vector([0.0, 1.0, 0.0], k=5); results = codeflash_output # 4.74μs -> 5.49μs (13.7% slower)

def test_attributes_field_is_missing(vector_store_config):
    """Test that missing attributes field returns empty dict."""
    docs = [{
        "id": "doc1",
        "text": "hello world",
        "vector": [1.0, 0.0, 0.0],
        # attributes field missing
        "@search.score": 0.99,
    }]
    db_conn = DummyDBConnection(docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    codeflash_output = store.similarity_search_by_vector([1.0, 0.0, 0.0], k=1); results = codeflash_output # 11.9μs -> 12.1μs (1.86% slower)

def test_id_and_text_fields_missing(vector_store_config):
    """Test that missing id/text fields return empty string."""
    docs = [{
        "vector": [1.0, 0.0, 0.0],
        "attributes": json.dumps({"lang": "en"}),
        "@search.score": 0.99,
    }]
    db_conn = DummyDBConnection(docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    codeflash_output = store.similarity_search_by_vector([1.0, 0.0, 0.0], k=1); results = codeflash_output # 11.4μs -> 11.4μs (0.062% slower)

def test_vector_field_missing(vector_store_config):
    """Test that missing vector field returns empty list."""
    docs = [{
        "id": "doc1",
        "text": "hello world",
        "attributes": json.dumps({"lang": "en"}),
        "@search.score": 0.99,
    }]
    db_conn = DummyDBConnection(docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    codeflash_output = store.similarity_search_by_vector([1.0, 0.0, 0.0], k=1); results = codeflash_output # 10.8μs -> 11.0μs (1.34% slower)


def test_search_score_missing(vector_store_config):
    """Test that missing @search.score raises KeyError."""
    docs = [{
        "id": "doc1",
        "text": "hello world",
        "vector": [1.0, 0.0, 0.0],
        "attributes": json.dumps({"lang": "en"}),
        # "@search.score" missing
    }]
    db_conn = DummyDBConnection(docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    with pytest.raises(KeyError):
        store.similarity_search_by_vector([1.0, 0.0, 0.0], k=1) # 15.3μs -> 15.5μs (1.11% slower)

# --- Edge Test Cases ---

def test_query_vector_empty(vector_store_config, basic_docs):
    """Test that an empty query vector is handled gracefully."""
    db_conn = DummyDBConnection(basic_docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    codeflash_output = store.similarity_search_by_vector([], k=2); results = codeflash_output # 15.4μs -> 14.8μs (4.34% faster)

def test_query_vector_wrong_size(vector_store_config, basic_docs):
    """Test that a query vector of wrong size is accepted (since DummyDBConnection ignores it)."""
    db_conn = DummyDBConnection(basic_docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    # Provide a vector of different size
    codeflash_output = store.similarity_search_by_vector([1.0, 0.0], k=2); results = codeflash_output # 13.8μs -> 13.7μs (1.05% faster)

def test_query_vector_non_numeric(vector_store_config, basic_docs):
    """Test that a query vector with non-numeric values is accepted (since DummyDBConnection ignores it)."""
    db_conn = DummyDBConnection(basic_docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    codeflash_output = store.similarity_search_by_vector(["a", "b", "c"], k=2); results = codeflash_output # 13.8μs -> 13.2μs (4.61% faster)

def test_db_returns_docs_with_extra_fields(vector_store_config):
    """Test that extra fields in docs are ignored."""
    docs = [{
        "id": "doc1",
        "text": "hello world",
        "vector": [1.0, 0.0, 0.0],
        "attributes": json.dumps({"lang": "en"}),
        "@search.score": 0.99,
        "extra_field": "extra_value",
    }]
    db_conn = DummyDBConnection(docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    codeflash_output = store.similarity_search_by_vector([1.0, 0.0, 0.0], k=1); results = codeflash_output # 10.6μs -> 10.4μs (2.13% faster)

def test_db_returns_docs_with_non_list_vector(vector_store_config):
    """Test that a vector field that is not a list is handled as-is."""
    docs = [{
        "id": "doc1",
        "text": "hello world",
        "vector": "not a list",
        "attributes": json.dumps({"lang": "en"}),
        "@search.score": 0.99,
    }]
    db_conn = DummyDBConnection(docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    codeflash_output = store.similarity_search_by_vector([1.0, 0.0, 0.0], k=1); results = codeflash_output # 10.8μs -> 10.5μs (2.72% faster)

def test_db_returns_docs_with_none_vector(vector_store_config):
    """Test that a vector field of None returns [] (default)."""
    docs = [{
        "id": "doc1",
        "text": "hello world",
        "vector": None,
        "attributes": json.dumps({"lang": "en"}),
        "@search.score": 0.99,
    }]
    db_conn = DummyDBConnection(docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    codeflash_output = store.similarity_search_by_vector([1.0, 0.0, 0.0], k=1); results = codeflash_output # 10.5μs -> 10.7μs (1.17% slower)


def test_large_scale_search_returns_expected_count(vector_store_config):
    """Test that similarity_search_by_vector can handle large numbers of docs."""
    num_docs = 1000
    docs = [{
        "id": f"doc{i}",
        "text": f"text {i}",
        "vector": [float(i % 3), float((i+1) % 3), float((i+2) % 3)],
        "attributes": json.dumps({"lang": "en", "idx": i}),
        "@search.score": 1.0 - i / num_docs,
    } for i in range(num_docs)]
    db_conn = DummyDBConnection(docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    codeflash_output = store.similarity_search_by_vector([1.0, 0.0, 0.0], k=1000); results = codeflash_output # 1.79ms -> 1.67ms (6.99% faster)

def test_large_scale_search_returns_top_k(vector_store_config):
    """Test that similarity_search_by_vector returns only top k docs."""
    num_docs = 1000
    docs = [{
        "id": f"doc{i}",
        "text": f"text {i}",
        "vector": [float(i % 3), float((i+1) % 3), float((i+2) % 3)],
        "attributes": json.dumps({"lang": "en", "idx": i}),
        "@search.score": 1.0 - i / num_docs,
    } for i in range(num_docs)]
    db_conn = DummyDBConnection(docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    k = 10
    codeflash_output = store.similarity_search_by_vector([1.0, 0.0, 0.0], k=k); results = codeflash_output # 33.5μs -> 32.7μs (2.36% faster)
    # Should be the first k docs
    for i in range(k):
        pass

def test_large_scale_search_performance(vector_store_config):
    """Test that similarity_search_by_vector completes within reasonable time for large input."""
    import time
    num_docs = 1000
    docs = [{
        "id": f"doc{i}",
        "text": f"text {i}",
        "vector": [float(i % 3), float((i+1) % 3), float((i+2) % 3)],
        "attributes": json.dumps({"lang": "en", "idx": i}),
        "@search.score": 1.0 - i / num_docs,
    } for i in range(num_docs)]
    db_conn = DummyDBConnection(docs)
    store = AzureAISearchVectorStore(vector_store_config, db_connection=db_conn)
    start = time.time()
    codeflash_output = store.similarity_search_by_vector([1.0, 0.0, 0.0], k=num_docs); results = codeflash_output # 1.75ms -> 1.69ms (3.54% faster)
    elapsed = time.time() - start
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import json
from abc import ABC
from typing import Any

# imports
import pytest  # used for our unit tests
from graphrag.vector_stores.azure_ai_search import AzureAISearchVectorStore

# function to test
# Copyright (c) 2024 Microsoft Corporation.
# Licensed under the MIT License


class VectorStoreSchemaConfig:
    """Mock config for vector store schema."""
    def __init__(self, index_name, id_field, text_field, vector_field, attributes_field, vector_size):
        self.index_name = index_name
        self.id_field = id_field
        self.text_field = text_field
        self.vector_field = vector_field
        self.attributes_field = attributes_field
        self.vector_size = vector_size

class VectorStoreDocument:
    """Mock document returned by vector store."""
    def __init__(self, id, text, vector, attributes):
        self.id = id
        self.text = text
        self.vector = vector
        self.attributes = attributes

class VectorStoreSearchResult:
    """Mock search result containing a document and its score."""
    def __init__(self, document, score):
        self.document = document
        self.score = score

class BaseVectorStore(ABC):
    """The base class for vector storage data-access classes."""

    def __init__(
        self,
        vector_store_schema_config: VectorStoreSchemaConfig,
        db_connection: Any | None = None,
        document_collection: Any | None = None,
        query_filter: Any | None = None,
        **kwargs: Any,
    ):
        self.db_connection = db_connection
        self.document_collection = document_collection
        self.query_filter = query_filter
        self.kwargs = kwargs

        self.index_name = vector_store_schema_config.index_name
        self.id_field = vector_store_schema_config.id_field
        self.text_field = vector_store_schema_config.text_field
        self.vector_field = vector_store_schema_config.vector_field
        self.attributes_field = vector_store_schema_config.attributes_field
        self.vector_size = vector_store_schema_config.vector_size

class VectorizedQuery:
    """Mock for Azure's VectorizedQuery."""
    def __init__(self, vector, k_nearest_neighbors, fields):
        self.vector = vector
        self.k_nearest_neighbors = k_nearest_neighbors
        self.fields = fields
from graphrag.vector_stores.azure_ai_search import AzureAISearchVectorStore

# --- Unit Tests ---

class MockDBConnection:
    """Mock DB connection for testing."""

    def __init__(self, docs):
        self.docs = docs

    def search(self, vector_queries):
        # For simplicity, just return self.docs (simulate search result)
        # In reality, would use vector_queries to select docs
        k = vector_queries[0].k_nearest_neighbors
        return self.docs[:k]


@pytest.fixture
def vector_store():
    # Set up a vector store with mock config and mock db
    config = VectorStoreSchemaConfig(
        index_name="test_index",
        id_field="id",
        text_field="text",
        vector_field="vector",
        attributes_field="attributes",
        vector_size=3
    )
    docs = [
        {
            "id": "doc1",
            "text": "hello world",
            "vector": [0.1, 0.2, 0.3],
            "attributes": '{"foo": "bar"}',
            "@search.score": 0.99
        },
        {
            "id": "doc2",
            "text": "python code",
            "vector": [0.2, 0.1, 0.4],
            "attributes": '{"baz": 42}',
            "@search.score": 0.88
        },
        {
            "id": "doc3",
            "text": "unit test",
            "vector": [0.3, 0.3, 0.3],
            "attributes": '{}',
            "@search.score": 0.77
        }
    ]
    db_connection = MockDBConnection(docs)
    store = AzureAISearchVectorStore(config, db_connection=db_connection)
    return store

# --- Basic Test Cases ---

def test_basic_search_returns_expected_results(vector_store):
    """Basic: Ensure correct results and mapping."""
    query = [0.1, 0.2, 0.3]
    codeflash_output = vector_store.similarity_search_by_vector(query, k=2); results = codeflash_output # 16.6μs -> 16.7μs (0.699% slower)

def test_basic_search_k_greater_than_docs(vector_store):
    """Basic: k larger than available docs returns all docs."""
    query = [0.1, 0.2, 0.3]
    codeflash_output = vector_store.similarity_search_by_vector(query, k=10); results = codeflash_output # 15.6μs -> 15.5μs (0.413% faster)

def test_basic_search_k_equals_1(vector_store):
    """Basic: k=1 returns only one result."""
    query = [0.1, 0.2, 0.3]
    codeflash_output = vector_store.similarity_search_by_vector(query, k=1); results = codeflash_output # 10.8μs -> 11.0μs (1.87% slower)

def test_basic_search_default_k(vector_store):
    """Basic: default k=10 returns all docs if less than k."""
    query = [0.1, 0.2, 0.3]
    codeflash_output = vector_store.similarity_search_by_vector(query); results = codeflash_output # 15.3μs -> 14.9μs (2.71% faster)

# --- Edge Test Cases ---

def test_edge_empty_document_list():
    """Edge: No documents in DB returns empty list."""
    config = VectorStoreSchemaConfig(
        index_name="test_index",
        id_field="id",
        text_field="text",
        vector_field="vector",
        attributes_field="attributes",
        vector_size=3
    )
    db_connection = MockDBConnection([])
    store = AzureAISearchVectorStore(config, db_connection=db_connection)
    query = [0.1, 0.2, 0.3]
    codeflash_output = store.similarity_search_by_vector(query, k=5); results = codeflash_output # 4.88μs -> 5.22μs (6.55% slower)







def test_edge_document_missing_fields():
    """Edge: Document missing fields uses defaults."""
    config = VectorStoreSchemaConfig(
        index_name="test_index",
        id_field="id",
        text_field="text",
        vector_field="vector",
        attributes_field="attributes",
        vector_size=3
    )
    docs = [
        {
            # missing id, text, vector, attributes
            "@search.score": 0.5
        }
    ]
    db_connection = MockDBConnection(docs)
    store = AzureAISearchVectorStore(config, db_connection=db_connection)
    query = [0.1, 0.2, 0.3]
    codeflash_output = store.similarity_search_by_vector(query, k=1); results = codeflash_output # 15.9μs -> 16.0μs (0.922% slower)

def test_edge_document_attributes_not_json():
    """Edge: Document attributes field not valid JSON uses empty dict."""
    config = VectorStoreSchemaConfig(
        index_name="test_index",
        id_field="id",
        text_field="text",
        vector_field="vector",
        attributes_field="attributes",
        vector_size=3
    )
    docs = [
        {
            "id": "docX",
            "text": "bad json",
            "vector": [0.1, 0.2, 0.3],
            "attributes": "{not valid json!}",
            "@search.score": 0.5
        }
    ]
    db_connection = MockDBConnection(docs)
    store = AzureAISearchVectorStore(config, db_connection=db_connection)
    query = [0.1, 0.2, 0.3]
    # Should raise JSONDecodeError
    with pytest.raises(json.JSONDecodeError):
        store.similarity_search_by_vector(query, k=1) # 15.2μs -> 15.7μs (3.40% slower)

def test_edge_document_score_missing():
    """Edge: Document missing score raises KeyError."""
    config = VectorStoreSchemaConfig(
        index_name="test_index",
        id_field="id",
        text_field="text",
        vector_field="vector",
        attributes_field="attributes",
        vector_size=3
    )
    docs = [
        {
            "id": "docY",
            "text": "no score",
            "vector": [0.1, 0.2, 0.3],
            "attributes": '{}'
        }
    ]
    db_connection = MockDBConnection(docs)
    store = AzureAISearchVectorStore(config, db_connection=db_connection)
    query = [0.1, 0.2, 0.3]
    with pytest.raises(KeyError):
        store.similarity_search_by_vector(query, k=1) # 11.6μs -> 11.9μs (2.59% slower)

# --- Large Scale Test Cases ---

def test_large_scale_many_documents():
    """Large scale: Search with 1000 documents."""
    config = VectorStoreSchemaConfig(
        index_name="test_index",
        id_field="id",
        text_field="text",
        vector_field="vector",
        attributes_field="attributes",
        vector_size=3
    )
    docs = []
    for i in range(1000):
        docs.append({
            "id": f"doc{i}",
            "text": f"text {i}",
            "vector": [float(i % 10) / 10, float((i+1) % 10) / 10, float((i+2) % 10) / 10],
            "attributes": '{"num": %d}' % i,
            "@search.score": 1.0 - (i / 1000)
        })
    db_connection = MockDBConnection(docs)
    store = AzureAISearchVectorStore(config, db_connection=db_connection)
    query = [0.0, 0.1, 0.2]
    codeflash_output = store.similarity_search_by_vector(query, k=1000); results = codeflash_output # 1.66ms -> 1.58ms (5.00% faster)

def test_large_scale_k_less_than_docs():
    """Large scale: k < #docs returns only k results."""
    config = VectorStoreSchemaConfig(
        index_name="test_index",
        id_field="id",
        text_field="text",
        vector_field="vector",
        attributes_field="attributes",
        vector_size=3
    )
    docs = []
    for i in range(500):
        docs.append({
            "id": f"doc{i}",
            "text": f"text {i}",
            "vector": [float(i % 10) / 10, float((i+1) % 10) / 10, float((i+2) % 10) / 10],
            "attributes": '{"num": %d}' % i,
            "@search.score": 1.0 - (i / 500)
        })
    db_connection = MockDBConnection(docs)
    store = AzureAISearchVectorStore(config, db_connection=db_connection)
    query = [0.0, 0.1, 0.2]
    codeflash_output = store.similarity_search_by_vector(query, k=100); results = codeflash_output # 180μs -> 172μs (4.60% faster)


def test_large_scale_performance():
    """Large scale: Performance check for 1000 docs (should run quickly)."""
    import time
    config = VectorStoreSchemaConfig(
        index_name="test_index",
        id_field="id",
        text_field="text",
        vector_field="vector",
        attributes_field="attributes",
        vector_size=3
    )
    docs = []
    for i in range(1000):
        docs.append({
            "id": f"doc{i}",
            "text": f"text {i}",
            "vector": [float(i % 10) / 10, float((i+1) % 10) / 10, float((i+2) % 10) / 10],
            "attributes": '{"num": %d}' % i,
            "@search.score": 1.0 - (i / 1000)
        })
    db_connection = MockDBConnection(docs)
    store = AzureAISearchVectorStore(config, db_connection=db_connection)
    query = [0.1, 0.2, 0.3]
    start = time.time()
    codeflash_output = store.similarity_search_by_vector(query, k=1000); results = codeflash_output # 1.67ms -> 1.55ms (7.80% faster)
    end = time.time()
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from graphrag.vector_stores.azure_ai_search import AzureAISearchVectorStore

To edit these changes git checkout codeflash/optimize-AzureAISearchVectorStore.similarity_search_by_vector-mglhryt4 and push.

The optimized code achieves a 5% speedup by eliminating repeated attribute lookups in the list comprehension loop. **Key optimizations:** 1. **Local variable caching**: The field names (`self.id_field`, `self.text_field`, etc.) are cached as local variables before the loop, avoiding repeated `self.` attribute lookups during iteration. 2. **Constructor reference caching**: Function references for `VectorStoreDocument`, `VectorStoreSearchResult`, and `json.loads` are stored in local variables (`vdoc_ctor`, `vsres_ctor`, `json_loads`), eliminating repeated global/module-level lookups. **Why this improves performance:** - Python's attribute lookup (`self.field`) and global name resolution are relatively expensive operations when performed repeatedly in tight loops - Local variable access is significantly faster than attribute or global lookups in Python's bytecode execution - The optimization is most effective when processing many documents, as shown by the larger speedups in large-scale tests (5-8% improvement with 1000 documents vs. smaller gains with few documents) **Test case performance patterns:** - Small result sets (k=0, k=1): Minimal or slight regression due to setup overhead - Medium result sets (k=2-10): Modest improvements (2-4%) - Large result sets (k=100-1000): Significant improvements (5-8%) The optimization maintains identical functionality while reducing the per-document processing overhead in the critical list comprehension loop.

codeflash-ai bot requested a review from mashraf-222 October 10, 2025 23:43

codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up method `AzureAISearchVectorStore.similarity_search_by_vector` by 6%#58

⚡️ Speed up method `AzureAISearchVectorStore.similarity_search_by_vector` by 6%#58
codeflash-ai[bot] wants to merge 1 commit intomainfrom
codeflash/optimize-AzureAISearchVectorStore.similarity_search_by_vector-mglhryt4

codeflash-ai bot commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

codeflash-ai bot commented Oct 10, 2025

📄 6% (0.06x) speedup for AzureAISearchVectorStore.similarity_search_by_vector in graphrag/vector_stores/azure_ai_search.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

📄 6% (0.06x) speedup for `AzureAISearchVectorStore.similarity_search_by_vector` in `graphrag/vector_stores/azure_ai_search.py`