Add OpenAI Responses-compatible endpoint#4582
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a text-first, OpenAI Responses API–compatible endpoint (POST /v1/responses) to LMDeploy’s OpenAI server, including request normalization (string/messages/instructions/developer role), function tool mapping/tool-choice validation, and an SSE streaming event surface. It also updates middleware route protection, integrates the new router into api_server, and adds tests + documentation (including Codex integration docs).
Changes:
- Add
lmdeploy/serve/openai/responses.pyimplementingPOST /v1/responses(non-stream + SSE streaming) and related request/response models. - Wire the new endpoint into the OpenAI API server and protect it under engine-sleep middleware.
- Add focused unit tests plus English/Chinese documentation and integration guides (Codex / Claude Code).
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_lmdeploy/serve/openai/test_responses.py | Adds unit coverage for input normalization, tools/tool_choice validation, response shapes, and SSE event shapes. |
| lmdeploy/serve/utils/server_utils.py | Adds /v1/responses to sleeping-engine protected inference routes. |
| lmdeploy/serve/openai/responses.py | Implements the Responses-compatible router, request parsing, tool conversion, non-stream response construction, and SSE streaming events. |
| lmdeploy/serve/openai/api_server.py | Registers the new Responses router on the FastAPI app. |
| docs/zh_cn/llm/api_server.md | Links to the new Responses endpoint documentation. |
| docs/zh_cn/llm/api_server_responses.md | Documents the /v1/responses endpoint (Text V1 subset), tools, SSE events, and Codex setup notes. |
| docs/zh_cn/index.rst | Adds the Responses doc page to the Chinese toctree. |
| docs/en/llm/api_server.md | Links to the new Responses endpoint documentation. |
| docs/en/llm/api_server_responses.md | Documents the /v1/responses endpoint and points to Codex integration docs. |
| docs/en/integration/codex.md | Adds a Codex → LMDeploy /v1/responses integration guide. |
| docs/en/integration/claude_code.md | Adds a Claude Code → LMDeploy /v1/messages integration guide. |
| docs/en/index.rst | Adds the Responses doc page and a new Integrations toctree (Codex/Claude Code). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ): | ||
| if getattr(request, field_name) is not None: | ||
| ignored_fields.append(field_name) | ||
| if request.parallel_tool_calls is not None and request.parallel_tool_calls is not True: |
There was a problem hiding this comment.
What's the expected behavior if "request.parallel_tool_calls is True"?
There was a problem hiding this comment.
Fixed, currently aligned with vLLM-style behavior. True/default keeps all parsed tool calls; False filters to the first final tool call or streaming tool-call index 0.
The behavior is model-specific; if the model supports parallel tool calls, the output can contain multiple tool calls per response.
| 'user', | ||
| 'presence_penalty', | ||
| 'frequency_penalty', | ||
| 'repetition_penalty', |
There was a problem hiding this comment.
we support "repetition_penalty", don't we?
There was a problem hiding this comment.
Fixed. ResponsesRequest now accepts repetition_penalty and forwards it to GenerationConfig
| 'stream_options', | ||
| 'top_logprobs', |
There was a problem hiding this comment.
Are "stream_options" and "top_logprobs" different from the ones defined in openai's v1/chat/completions?
| except ValueError as err: | ||
| return _error_response(HTTPStatus.BAD_REQUEST, str(err), param='tool_choice') | ||
|
|
||
| parser_cls = getattr(server_context, 'response_parser_cls', None) |
There was a problem hiding this comment.
parser_cls = server_context.response_parser_cls
We don't need to check if parser_cls is None since it is definitely by api_server's set_parsers
|
|
||
| parser_cls = getattr(server_context, 'response_parser_cls', None) | ||
| tools_enabled = tools and tool_choice != 'none' | ||
| if tools_enabled and (parser_cls is None or parser_cls.tool_parser_cls is None): |
There was a problem hiding this comment.
"parser_cls is None" can be removed safely
| tools=parser_tools, | ||
| tool_choice=tool_choice, | ||
| ) | ||
| response_parser = parser_cls(request=openai_request, tokenizer=tokenizer) |
There was a problem hiding this comment.
May rebase main branch since the initialization of parsers doesn't request "tokenizer" any longer
|
readthedocs build error: May add "openai" in docs.txt |
17455dc to
1e29354
Compare
Summary
POST /v1/responsesendpoint.Validation
pytest tests/test_lmdeploy/serve/openai/test_responses.py -q(18 passed)git diff --check upstream/main...HEADCodex Demo
Assistance
Assisted with Codex + GPT-5.5 High