Skip to content

feat: add MiniMax as generation backend#365

Closed
octo-patch wants to merge 1 commit intoOpenBMB:mainfrom
octo-patch:feature/add-minimax-provider
Closed

feat: add MiniMax as generation backend#365
octo-patch wants to merge 1 commit intoOpenBMB:mainfrom
octo-patch:feature/add-minimax-provider

Conversation

@octo-patch
Copy link
Copy Markdown

@octo-patch octo-patch commented Mar 21, 2026

Summary

Add MiniMax as a first-class LLM generation backend alongside the existing vllm, openai, and hf backends.

MiniMax provides OpenAI-compatible cloud APIs with models featuring up to 1M context windows, making them well-suited for RAG workloads that require processing large retrieved contexts.

Changes

  • servers/generation/src/generation.py — Added minimax backend with:
    • Auto-detection of MINIMAX_API_KEY environment variable
    • Temperature clamping to MiniMax's accepted (0, 1] range
    • Automatic <think>...</think> tag stripping (configurable via strip_think_tags)
    • Default model: MiniMax-M2.7 (1M context)
    • Concurrent request support with exponential backoff retry
    • Two static helper methods: _clamp_temperature() and _strip_think_tags()
  • servers/generation/parameter.yaml — Added MiniMax config section with all available options
  • examples/minimax_rag.yaml — Example RAG pipeline using MiniMax backend
  • examples/parameter/minimax_generation_parameter.yaml — Full parameter reference
  • README.md / docs/README_zh.md — Added "Supported Cloud LLM Backends" table documenting all four backends with MiniMax usage instructions
  • tests/test_minimax_generation.py — 38 unit tests covering temperature clamping, think-tag stripping, initialization, and generation
  • tests/test_minimax_integration.py — 3 integration tests (auto-skipped when MINIMAX_API_KEY is not set)

Supported Models

Model Context Notes
MiniMax-M2.7 1M tokens Latest, default
MiniMax-M2.7-highspeed 1M tokens Fast variant
MiniMax-M2.5 256K tokens Previous generation
MiniMax-M2.5-highspeed 204K tokens Fast, long context

Usage

export MINIMAX_API_KEY="your-api-key"
ultrarag run examples/minimax_rag.yaml

Or set backend: minimax in your generation parameter file.

Test Plan

  • 38 unit tests pass (temperature clamping, think-tag stripping, init validation, mock generation)
  • 3 integration tests pass against live MiniMax API
  • Verify no regression on existing vllm/openai/hf backends

9 files changed, 857 additions(+), 3 deletions(-)

Add MiniMax as a first-class LLM provider in the generation server,
alongside vllm, openai, and hf backends. MiniMax provides
OpenAI-compatible cloud APIs with M2.7 and M2.5 model series.

Features:
- Dedicated minimax backend with auto-detection of MINIMAX_API_KEY
- Temperature clamping to MiniMax's (0, 1] range
- Automatic <think>...</think> tag stripping (configurable)
- Default model: MiniMax-M2.7 (1M context window)
- Concurrent request support with retry logic
- Example YAML pipeline and parameter configuration
- 38 unit tests + 3 integration tests
- Documentation in both English and Chinese READMEs

Supported models: MiniMax-M2.7, MiniMax-M2.7-highspeed,
MiniMax-M2.5, MiniMax-M2.5-highspeed
@xhd0728
Copy link
Copy Markdown
Collaborator

xhd0728 commented Apr 8, 2026

@octo-patch Thanks for the PR~

The main issue is that this introduces provider-specific logic and a dedicated parameter setup for MiniMax in the generation framework. We'd prefer not to maintain a separate parameter path for one specific provider, and instead keep the backend interface and configuration as unified as possible.

Therefore, we won't merge this PR in its current form.

@xhd0728 xhd0728 closed this Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants