Skip to content

Add OpenAI Responses-compatible endpoint#4582

Open
CUHKSZzxy wants to merge 22 commits into
InternLM:mainfrom
CUHKSZzxy:feat/responses-api-text-v1
Open

Add OpenAI Responses-compatible endpoint#4582
CUHKSZzxy wants to merge 22 commits into
InternLM:mainfrom
CUHKSZzxy:feat/responses-api-text-v1

Conversation

@CUHKSZzxy
Copy link
Copy Markdown
Collaborator

@CUHKSZzxy CUHKSZzxy commented May 13, 2026

Summary

  • Add a text-first OpenAI Responses-compatible POST /v1/responses endpoint.
  • Support string/message input, instructions/developer-role normalization, function tools, tool choice validation, and Responses SSE events.
  • Add focused tests, Responses API docs, and Codex integration docs.

Validation

  • pytest tests/test_lmdeploy/serve/openai/test_responses.py -q (18 passed)
  • git diff --check upstream/main...HEAD
  • Local Codex smoke tests against LMDeploy for no-tool, read, edit, multi-step, and project workflows.

Codex Demo

codex_sample

Assistance

Assisted with Codex + GPT-5.5 High

@CUHKSZzxy CUHKSZzxy marked this pull request as ready for review May 13, 2026 03:30
Copilot AI review requested due to automatic review settings May 13, 2026 03:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a text-first, OpenAI Responses API–compatible endpoint (POST /v1/responses) to LMDeploy’s OpenAI server, including request normalization (string/messages/instructions/developer role), function tool mapping/tool-choice validation, and an SSE streaming event surface. It also updates middleware route protection, integrates the new router into api_server, and adds tests + documentation (including Codex integration docs).

Changes:

  • Add lmdeploy/serve/openai/responses.py implementing POST /v1/responses (non-stream + SSE streaming) and related request/response models.
  • Wire the new endpoint into the OpenAI API server and protect it under engine-sleep middleware.
  • Add focused unit tests plus English/Chinese documentation and integration guides (Codex / Claude Code).

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/test_lmdeploy/serve/openai/test_responses.py Adds unit coverage for input normalization, tools/tool_choice validation, response shapes, and SSE event shapes.
lmdeploy/serve/utils/server_utils.py Adds /v1/responses to sleeping-engine protected inference routes.
lmdeploy/serve/openai/responses.py Implements the Responses-compatible router, request parsing, tool conversion, non-stream response construction, and SSE streaming events.
lmdeploy/serve/openai/api_server.py Registers the new Responses router on the FastAPI app.
docs/zh_cn/llm/api_server.md Links to the new Responses endpoint documentation.
docs/zh_cn/llm/api_server_responses.md Documents the /v1/responses endpoint (Text V1 subset), tools, SSE events, and Codex setup notes.
docs/zh_cn/index.rst Adds the Responses doc page to the Chinese toctree.
docs/en/llm/api_server.md Links to the new Responses endpoint documentation.
docs/en/llm/api_server_responses.md Documents the /v1/responses endpoint and points to Codex integration docs.
docs/en/integration/codex.md Adds a Codex → LMDeploy /v1/responses integration guide.
docs/en/integration/claude_code.md Adds a Claude Code → LMDeploy /v1/messages integration guide.
docs/en/index.rst Adds the Responses doc page and a new Integrations toctree (Codex/Claude Code).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lmdeploy/serve/openai/responses/serving.py Outdated
Comment thread lmdeploy/serve/openai/responses.py Outdated
Comment thread lmdeploy/serve/openai/responses/serving.py Outdated
@lvhan028 lvhan028 added the enhancement New feature or request label May 13, 2026
@lvhan028 lvhan028 self-requested a review May 23, 2026 09:46
):
if getattr(request, field_name) is not None:
ignored_fields.append(field_name)
if request.parallel_tool_calls is not None and request.parallel_tool_calls is not True:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the expected behavior if "request.parallel_tool_calls is True"?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, currently aligned with vLLM-style behavior. True/default keeps all parsed tool calls; False filters to the first final tool call or streaming tool-call index 0.

The behavior is model-specific; if the model supports parallel tool calls, the output can contain multiple tool calls per response.

'user',
'presence_penalty',
'frequency_penalty',
'repetition_penalty',
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we support "repetition_penalty", don't we?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. ResponsesRequest now accepts repetition_penalty and forwards it to GenerationConfig

Comment on lines +86 to +87
'stream_options',
'top_logprobs',
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are "stream_options" and "top_logprobs" different from the ones defined in openai's v1/chat/completions?

except ValueError as err:
return _error_response(HTTPStatus.BAD_REQUEST, str(err), param='tool_choice')

parser_cls = getattr(server_context, 'response_parser_cls', None)
Copy link
Copy Markdown
Collaborator

@lvhan028 lvhan028 Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parser_cls = server_context.response_parser_cls
We don't need to check if parser_cls is None since it is definitely by api_server's set_parsers


parser_cls = getattr(server_context, 'response_parser_cls', None)
tools_enabled = tools and tool_choice != 'none'
if tools_enabled and (parser_cls is None or parser_cls.tool_parser_cls is None):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"parser_cls is None" can be removed safely

tools=parser_tools,
tool_choice=tool_choice,
)
response_parser = parser_cls(request=openai_request, tokenizer=tokenizer)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May rebase main branch since the initialization of parsers doesn't request "tokenizer" any longer

@lvhan028
Copy link
Copy Markdown
Collaborator

lvhan028 commented Jun 1, 2026

readthedocs build error:

  | from .protocol import (
  | File "/home/docs/checkouts/readthedocs.org/user_builds/lmdeploy/checkouts/4582/lmdeploy/serve/openai/responses/protocol.py", line 9, in <module>
  | from openai.types.responses import (
  | ModuleNotFoundError: No module named 'openai'

May add "openai" in docs.txt

@CUHKSZzxy CUHKSZzxy force-pushed the feat/responses-api-text-v1 branch from 17455dc to 1e29354 Compare June 2, 2026 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants