feat(ai-rag): support multiple embedding providers, add Cohere rerank, and standardize chat interface by ChuanFF · Pull Request #12941 · apache/apisix

ChuanFF · 2026-01-27T05:45:22Z

Description

This PR enhances the ai-rag plugin with multi-provider support, optional reranking capability, and a standardized chat interface that works with native LLM request formats.

Key Improvements

1. Multiple Embedding Provider Support

Support three embedding backends via unified schema:
- openai – OpenAI embeddings API
- azure-openai – Azure OpenAI Service
- openai-compatible – Any OpenAI-compatible endpoint
Shared implementation via openai-base.lua with provider-specific configuration
Added model and dimensions options for flexibility

2. Optional Rerank Stage

Introduced optional reranking after vector search to improve document relevance
Initial implementation: Cohere Rerank
- Configurable model and top_n parameter
- Graceful fallback to original results on failure
Rerank is fully optional and only executed when configured

3. Standardized Chat Interface

Removed dependency on custom ai_rag field in request body

Plugin now works with standard LLM chat formats:

{
  "messages": [
    {"role": "user", "content": "What is APISIX?"}
  ]
}

Added rag_config.input_strategy to control text extraction:
- last (default): uses only the latest user message
- all: concatenates all user messages with newlines

4. Improved Context Injection

Retrieved (and optionally reranked) documents are injected as contextual user messages
Context is inserted before the latest user message, ensuring relevance to the current query:
```
Context:
[doc1]\n\n[doc2]\n\n...
```

5. Vector Search Enhancements (Azure AI Search)

Renamed provider to azure-ai-search (kebab-case consistency)
Extended configuration options:
- fields – fields to search against
- select – fields to return in results
- k – number of nearest neighbors (default: 5)
- exhaustive – whether to perform exhaustive search (default: true)
Returns parsed documents instead of raw response bodies

6. Schema Improvements

Simplified plugin schema using oneOf for provider selection
Added descriptive field documentation
Removed request-side ai_rag payload requirements

Backward Compatibility

⚠️ This PR introduces breaking changes:

Area	Before	After
Request format	Required `ai_rag` field with nested configs	Standard `messages` array only
Azure OpenAI key	`azure_openai` (snake_case)	`azure-openai` (kebab-case)
Context position	Appended as final message	Inserted before latest user message
Vector search output	Raw JSON string	Table of document strings

This redesign is intentional to support more flexible and production-ready RAG workflows.

Example Configuration

{
  "embeddings_provider": {
    "openai": {
      "endpoint": "https://api.openai.com/v1/embeddings",
      "api_key": "sk-xxx",
      "model": "text-embedding-3-large",
      "dimensions": 1536
    }
  },
  "vector_search_provider": {
    "azure-ai-search": {
      "endpoint": "https://xxx.search.windows.net/indexes/xxx/docs/search?api-version=2023-11-01",
      "api_key": "xxx",
      "fields": "vector",
      "select": "content",
      "k": 5,
      "exhaustive": true
    }
  },
  "rerank_provider": {
    "cohere": {
      "endpoint": "https://api.cohere.ai/v2/rerank",
      "api_key": "xxx",
      "model": "rerank-english-v3.0",
      "top_n": 3
    }
  },
  "rag_config": {
    "input_strategy": "last"
  }
}

Motivation

The original ai-rag implementation had several limitations:

Vendor lock-in: Only supported Azure OpenAI embeddings
Custom request format: Required ai_rag field, making integration with standard LLM APIs cumbersome
No reranking: Retrieved documents were used as-is without relevance scoring
Rigid context injection: Context was appended after the user query, which could dilute the user query's importance

This enhancement addresses these issues by:

Enabling multiple embedding providers with minimal code duplication
Supporting realistic RAG pipelines (retrieve → rerank → augment)
Simplifying integration with standard LLM chat APIs
Providing better control over context injection behavior

Baoyuantop · 2026-02-02T02:08:56Z

Hi @ChuanFF, could you explain these breaking changes? Is it necessary to introduce these changes?

ChuanFF · 2026-02-02T03:45:44Z

Hi @ChuanFF, could you explain these breaking changes? Is it necessary to introduce these changes?
@Baoyuantop

Request format：The previous plugin required the ai_rag field to be included in the request, which is not standard practice. Ideally, the document retrieval should be performed on the user's question (usually the last question). This approach is directly compatible with the openai-api's completions interface. We can refer to other AI proxy projects like Higress and Literm for this.
Azure OpenAI key： azure_openai was changed to azure-openai to maintain consistency with the ai-proxy plugin's fields. Please let me know if prefer to keep it as azure_openai.
Context position：The ai-rag information should be inserted before the user's question to keep the LLM focused on the user's question. Otherwise, in some scenarios, the LLM might treat inserted documents as user questions, reducing the quality of the LLM's response. We can refer to other AI proxy projects for this as well.
Vector search output：The previous plugin, when using azure-ai-search for document retrieval, neither filtered the fields nor parsed the response body to obtain the document content. This resulted in the rag results passed to the LLM containing a large amount of information unrelated to the retrieved documents.

Copilot

Pull request overview

This PR significantly enhances the ai-rag plugin by adding multi-provider support for embeddings, introducing optional reranking capability via Cohere, and standardizing the chat interface to work with native LLM request formats. The changes represent a major architectural improvement but introduce breaking changes in the API.

Changes:

Added support for multiple embedding providers (OpenAI, Azure OpenAI, OpenAI-compatible) through a unified schema and shared openai-base implementation
Implemented optional Cohere rerank capability with graceful fallback behavior
Removed custom ai_rag field requirement from requests, now works with standard messages array format
Enhanced Azure AI Search configuration with additional options (fields, select, k, exhaustive)
Renamed providers to use consistent kebab-case naming (azure-openai, azure-ai-search)

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
`apisix/plugins/ai-rag.lua`	Core plugin rewrite with schema redesign using oneOf, dynamic driver loading, standardized message handling, and context injection before last user message
`apisix/plugins/ai-rag/embeddings/openai-base.lua`	Unified embeddings implementation supporting OpenAI API format with Bearer token authentication, model and dimensions configuration
`apisix/plugins/ai-rag/embeddings/openai.lua`	Wrapper returning openai-base for OpenAI provider
`apisix/plugins/ai-rag/embeddings/azure-openai.lua`	Wrapper returning openai-base for Azure OpenAI provider
`apisix/plugins/ai-rag/embeddings/openai-compatible.lua`	Wrapper returning openai-base for OpenAI-compatible providers
`apisix/plugins/ai-rag/vector-search/azure-ai-search.lua`	Enhanced Azure AI Search with configurable fields, select, k, exhaustive; returns parsed document strings instead of raw JSON
`apisix/plugins/ai-rag/rerank/cohere.lua`	New Cohere rerank implementation with graceful fallback on API failures
`t/plugin/ai-rag.t`	Comprehensive test suite with mock servers for embeddings, vector search, and rerank; tests authentication, error handling, and context injection
`docs/en/latest/plugins/ai-rag.md`	Updated documentation with new provider configurations, attributes table, and usage examples
`docs/zh/latest/plugins/ai-rag.md`	Chinese translation of updated documentation
`Makefile`	Added installation rules for rerank directory

Comments suppressed due to low confidence (5)

apisix/plugins/ai-rag/embeddings/openai-base.lua:98

The model field has a default value in the schema (line 38), but when constructing the request body in line 98, conf.model is used directly without checking if it's nil. If a user doesn't specify a model, the default should be applied during schema validation. However, if for any reason the default isn't applied, this could result in sending model: null to the API. Consider adding a fallback or verifying that schema defaults are always applied.
apisix/plugins/ai-rag/vector-search/azure-ai-search.lua:37
The description for the fields property says "Comma-separated list of fields to retrieve" (line 37), but based on Azure AI Search API documentation and the usage in line 66, this should be a single field name for vector search, not a comma-separated list. The description should be updated to accurately reflect that it's the target field name for the vector search query.
apisix/plugins/ai-rag/embeddings/openai-base.lua:99
The dimensions field is optional in the schema (no default, not required), but it's always included in the request body (line 99). When conf.dimensions is nil, this will result in dimensions: null in the JSON body sent to the API. While OpenAI's API may tolerate null values by ignoring them, it's cleaner to only include the dimensions field when it's explicitly set. Consider using conditional logic: if conf.dimensions then body.dimensions = conf.dimensions end.
apisix/plugins/ai-rag/vector-search/azure-ai-search.lua:105
The code accesses item[conf.select] (line 105) without checking if the field exists in the item. If the Azure AI Search returns items without the specified select field, this could result in nil values in the docs array, which may cause issues downstream in the rerank or context injection stages. Consider adding validation to ensure the field exists, or filtering out items that don't contain the selected field.
apisix/plugins/ai-rag/vector-search/azure-ai-search.lua:46
The description for the select property says "field to select in the response" (singular), but the Azure AI Search API and documentation both indicate this can be a comma-separated list of fields. The implementation in line 105 (item[conf.select]) treats it as a single field key, so either the implementation should handle comma-separated values or the description should explicitly state it's a single field name.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

apisix/plugins/ai-rag.lua

docs/en/latest/plugins/ai-rag.md

docs/zh/latest/plugins/ai-rag.md

apisix/plugins/ai-rag/rerank/cohere.lua

apisix/plugins/ai-rag.lua

docs/zh/latest/plugins/ai-rag.md

ChuanFF · 2026-02-28T16:20:37Z

@Baoyuantop I've addressed the feedback from Copilot in this PR. Could you please take another look?

feat ai-rag rerank

603e3cb

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement New feature or request labels Jan 27, 2026

nic-chen requested review from AlinsRan, Baoyuantop, membphis, nic-6443 and shreemaan-abhishek January 27, 2026 07:46

ChuanFF added 2 commits February 2, 2026 00:18

code lint

8d7bfa8

Merge remote-tracking branch 'origin/master' into feat-ai-rag-rerank

d455cc8

Baoyuantop requested a review from Copilot February 13, 2026 08:10

Copilot started reviewing on behalf of Baoyuantop February 13, 2026 08:10 View session

Copilot AI reviewed Feb 13, 2026

View reviewed changes

ChuanFF added 2 commits February 28, 2026 22:49

copilot review: 删除冗余代码逻辑；测试用例覆盖完善

3b3cc97

doc fix

a0dadab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai-rag): support multiple embedding providers, add Cohere rerank, and standardize chat interface#12941

feat(ai-rag): support multiple embedding providers, add Cohere rerank, and standardize chat interface#12941
ChuanFF wants to merge 5 commits intoapache:masterfrom
ChuanFF:feat-ai-rag-rerank

ChuanFF commented Jan 27, 2026

Uh oh!

Baoyuantop commented Feb 2, 2026

Uh oh!

ChuanFF commented Feb 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChuanFF commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ChuanFF commented Jan 27, 2026

Description

Key Improvements

1. Multiple Embedding Provider Support

2. Optional Rerank Stage

3. Standardized Chat Interface

4. Improved Context Injection

5. Vector Search Enhancements (Azure AI Search)

6. Schema Improvements

Backward Compatibility

Example Configuration

Motivation

Uh oh!

Baoyuantop commented Feb 2, 2026

Uh oh!

ChuanFF commented Feb 2, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChuanFF commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants