Skip to content

feat(ai-rag): support multiple embedding providers, add Cohere rerank, and standardize chat interface#12941

Open
ChuanFF wants to merge 5 commits intoapache:masterfrom
ChuanFF:feat-ai-rag-rerank
Open

feat(ai-rag): support multiple embedding providers, add Cohere rerank, and standardize chat interface#12941
ChuanFF wants to merge 5 commits intoapache:masterfrom
ChuanFF:feat-ai-rag-rerank

Conversation

@ChuanFF
Copy link
Contributor

@ChuanFF ChuanFF commented Jan 27, 2026

Description

This PR enhances the ai-rag plugin with multi-provider support, optional reranking capability, and a standardized chat interface that works with native LLM request formats.

Key Improvements

1. Multiple Embedding Provider Support

  • Support three embedding backends via unified schema:
    • openai – OpenAI embeddings API
    • azure-openai – Azure OpenAI Service
    • openai-compatible – Any OpenAI-compatible endpoint
  • Shared implementation via openai-base.lua with provider-specific configuration
  • Added model and dimensions options for flexibility

2. Optional Rerank Stage

  • Introduced optional reranking after vector search to improve document relevance
  • Initial implementation: Cohere Rerank
    • Configurable model and top_n parameter
    • Graceful fallback to original results on failure
  • Rerank is fully optional and only executed when configured

3. Standardized Chat Interface

  • Removed dependency on custom ai_rag field in request body
  • Plugin now works with standard LLM chat formats:
    {
      "messages": [
        {"role": "user", "content": "What is APISIX?"}
      ]
    }
  • Added rag_config.input_strategy to control text extraction:
    • last (default): uses only the latest user message
    • all: concatenates all user messages with newlines

4. Improved Context Injection

  • Retrieved (and optionally reranked) documents are injected as contextual user messages
  • Context is inserted before the latest user message, ensuring relevance to the current query:
    Context:
    [doc1]\n\n[doc2]\n\n...
    

5. Vector Search Enhancements (Azure AI Search)

  • Renamed provider to azure-ai-search (kebab-case consistency)
  • Extended configuration options:
    • fields – fields to search against
    • select – fields to return in results
    • k – number of nearest neighbors (default: 5)
    • exhaustive – whether to perform exhaustive search (default: true)
  • Returns parsed documents instead of raw response bodies

6. Schema Improvements

  • Simplified plugin schema using oneOf for provider selection
  • Added descriptive field documentation
  • Removed request-side ai_rag payload requirements

Backward Compatibility

⚠️ This PR introduces breaking changes:

Area Before After
Request format Required ai_rag field with nested configs Standard messages array only
Azure OpenAI key azure_openai (snake_case) azure-openai (kebab-case)
Context position Appended as final message Inserted before latest user message
Vector search output Raw JSON string Table of document strings

This redesign is intentional to support more flexible and production-ready RAG workflows.


Example Configuration

{
  "embeddings_provider": {
    "openai": {
      "endpoint": "https://api.openai.com/v1/embeddings",
      "api_key": "sk-xxx",
      "model": "text-embedding-3-large",
      "dimensions": 1536
    }
  },
  "vector_search_provider": {
    "azure-ai-search": {
      "endpoint": "https://xxx.search.windows.net/indexes/xxx/docs/search?api-version=2023-11-01",
      "api_key": "xxx",
      "fields": "vector",
      "select": "content",
      "k": 5,
      "exhaustive": true
    }
  },
  "rerank_provider": {
    "cohere": {
      "endpoint": "https://api.cohere.ai/v2/rerank",
      "api_key": "xxx",
      "model": "rerank-english-v3.0",
      "top_n": 3
    }
  },
  "rag_config": {
    "input_strategy": "last"
  }
}

Motivation

The original ai-rag implementation had several limitations:

  1. Vendor lock-in: Only supported Azure OpenAI embeddings
  2. Custom request format: Required ai_rag field, making integration with standard LLM APIs cumbersome
  3. No reranking: Retrieved documents were used as-is without relevance scoring
  4. Rigid context injection: Context was appended after the user query, which could dilute the user query's importance

This enhancement addresses these issues by:

  • Enabling multiple embedding providers with minimal code duplication
  • Supporting realistic RAG pipelines (retrieve → rerank → augment)
  • Simplifying integration with standard LLM chat APIs
  • Providing better control over context injection behavior

@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement New feature or request labels Jan 27, 2026
@Baoyuantop
Copy link
Contributor

Hi @ChuanFF, could you explain these breaking changes? Is it necessary to introduce these changes?

@ChuanFF
Copy link
Contributor Author

ChuanFF commented Feb 2, 2026

Hi @ChuanFF, could you explain these breaking changes? Is it necessary to introduce these changes?
@Baoyuantop

  1. Request format:The previous plugin required the ai_rag field to be included in the request, which is not standard practice. Ideally, the document retrieval should be performed on the user's question (usually the last question). This approach is directly compatible with the openai-api's completions interface. We can refer to other AI proxy projects like Higress and Literm for this.

  2. Azure OpenAI keyazure_openai was changed to azure-openai to maintain consistency with the ai-proxy plugin's fields. Please let me know if prefer to keep it as azure_openai.

  3. Context position:The ai-rag information should be inserted before the user's question to keep the LLM focused on the user's question. Otherwise, in some scenarios, the LLM might treat inserted documents as user questions, reducing the quality of the LLM's response. We can refer to other AI proxy projects for this as well.

  4. Vector search output:The previous plugin, when using azure-ai-search for document retrieval, neither filtered the fields nor parsed the response body to obtain the document content. This resulted in the rag results passed to the LLM containing a large amount of information unrelated to the retrieved documents.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR significantly enhances the ai-rag plugin by adding multi-provider support for embeddings, introducing optional reranking capability via Cohere, and standardizing the chat interface to work with native LLM request formats. The changes represent a major architectural improvement but introduce breaking changes in the API.

Changes:

  • Added support for multiple embedding providers (OpenAI, Azure OpenAI, OpenAI-compatible) through a unified schema and shared openai-base implementation
  • Implemented optional Cohere rerank capability with graceful fallback behavior
  • Removed custom ai_rag field requirement from requests, now works with standard messages array format
  • Enhanced Azure AI Search configuration with additional options (fields, select, k, exhaustive)
  • Renamed providers to use consistent kebab-case naming (azure-openai, azure-ai-search)

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
apisix/plugins/ai-rag.lua Core plugin rewrite with schema redesign using oneOf, dynamic driver loading, standardized message handling, and context injection before last user message
apisix/plugins/ai-rag/embeddings/openai-base.lua Unified embeddings implementation supporting OpenAI API format with Bearer token authentication, model and dimensions configuration
apisix/plugins/ai-rag/embeddings/openai.lua Wrapper returning openai-base for OpenAI provider
apisix/plugins/ai-rag/embeddings/azure-openai.lua Wrapper returning openai-base for Azure OpenAI provider
apisix/plugins/ai-rag/embeddings/openai-compatible.lua Wrapper returning openai-base for OpenAI-compatible providers
apisix/plugins/ai-rag/vector-search/azure-ai-search.lua Enhanced Azure AI Search with configurable fields, select, k, exhaustive; returns parsed document strings instead of raw JSON
apisix/plugins/ai-rag/rerank/cohere.lua New Cohere rerank implementation with graceful fallback on API failures
t/plugin/ai-rag.t Comprehensive test suite with mock servers for embeddings, vector search, and rerank; tests authentication, error handling, and context injection
docs/en/latest/plugins/ai-rag.md Updated documentation with new provider configurations, attributes table, and usage examples
docs/zh/latest/plugins/ai-rag.md Chinese translation of updated documentation
Makefile Added installation rules for rerank directory
Comments suppressed due to low confidence (5)

apisix/plugins/ai-rag/embeddings/openai-base.lua:98

  • The model field has a default value in the schema (line 38), but when constructing the request body in line 98, conf.model is used directly without checking if it's nil. If a user doesn't specify a model, the default should be applied during schema validation. However, if for any reason the default isn't applied, this could result in sending model: null to the API. Consider adding a fallback or verifying that schema defaults are always applied.
    apisix/plugins/ai-rag/vector-search/azure-ai-search.lua:37
  • The description for the fields property says "Comma-separated list of fields to retrieve" (line 37), but based on Azure AI Search API documentation and the usage in line 66, this should be a single field name for vector search, not a comma-separated list. The description should be updated to accurately reflect that it's the target field name for the vector search query.
    apisix/plugins/ai-rag/embeddings/openai-base.lua:99
  • The dimensions field is optional in the schema (no default, not required), but it's always included in the request body (line 99). When conf.dimensions is nil, this will result in dimensions: null in the JSON body sent to the API. While OpenAI's API may tolerate null values by ignoring them, it's cleaner to only include the dimensions field when it's explicitly set. Consider using conditional logic: if conf.dimensions then body.dimensions = conf.dimensions end.
    apisix/plugins/ai-rag/vector-search/azure-ai-search.lua:105
  • The code accesses item[conf.select] (line 105) without checking if the field exists in the item. If the Azure AI Search returns items without the specified select field, this could result in nil values in the docs array, which may cause issues downstream in the rerank or context injection stages. Consider adding validation to ensure the field exists, or filtering out items that don't contain the selected field.
    apisix/plugins/ai-rag/vector-search/azure-ai-search.lua:46
  • The description for the select property says "field to select in the response" (singular), but the Azure AI Search API and documentation both indicate this can be a comma-separated list of fields. The implementation in line 105 (item[conf.select]) treats it as a single field key, so either the implementation should handle comma-separated values or the description should explicitly state it's a single field name.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ChuanFF
Copy link
Contributor Author

ChuanFF commented Feb 28, 2026

@Baoyuantop I've addressed the feedback from Copilot in this PR. Could you please take another look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants