Skip to content

Significant slowdown in SearchClient.upload_documents(...) for large payloads #46860

@msilvestrixdatanet

Description

@msilvestrixdatanet
  • Package Name: azure-search-documents
  • Package Version: 12.0.0
  • Operating System: Windows 11/Ubuntu 24.04
  • Python Version: 3.12

Describe the bug
After upgrading azure-search-documents from 11.6.0 to 12.0.0, we observe a significant slowdown in SearchClient.upload_documents(...) for large payloads (hundreds of docs, vector field dimension 3072).
A similar issue happens when invoking SearchClient.merge_documents(...).
The regression appears client-side, before HTTP/network becomes dominant.

To Reproduce

  1. Create two clean virtual environments:

    • one with azure-search-documents==11.6.0
    • one with azure-search-documents==12.0.0
  2. Use the same Azure AI Search service and the same existing index for both runs.

    • The index must contain a vector field (e.g. content_vector) with dimension 3072.
    • Keep all index settings identical across runs.
  3. Prepare a synthetic payload with large vector-heavy documents:

    • hundreds or even thousands of documents per run (e.g. 1000)
    • each document includes:
      • key/id
      • text field (thousands of chars)
      • vector field with 3072 floats
      • a few metadata fields
  4. Run the benchmark serially (not in parallel), alternating versions:

    • run 1 with 11.6.0
    • run 2 with 12.0.0
  5. For each run, measure:

    • wall-clock time of SearchClient.upload_documents(documents=payload)

Expected behavior
12.0.0 should not introduce a major regression compared to 11.6.0 for typical bulk upload workloads with large vector fields.

Additional context
In 12.0.0, IndexDocumentsBatch._extend_batch builds each action as:

action_dict = {"@search.action": action_type}
action_dict.update(doc)
action = IndexAction(action_dict)

This goes through model conversion/serialization paths (Model.__init__, _create_value, _serialize) for each document and recursively for nested structures (including large vectors).

Likely files/methods:

  • azure/search/documents/models/_patch.py (IndexDocumentsBatch._extend_batch)
  • azure/search/documents/_utils/model_base.py (Model.__init__, _serialize)

Metadata

Metadata

Assignees

Labels

ClientThis issue points to a problem in the data-plane of the library.SearchService AttentionWorkflow: This issue is responsible by Azure service team.bugThis issue requires a change to an existing behavior in the product in order to be resolved.customer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK team

Type

No type
No fields configured for issues without a type.

Projects

Status
Untriaged

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions