Add Responses API support and model-level routing#205
Add Responses API support and model-level routing#205Godzilla675 wants to merge 2 commits intoericc-ch:masterfrom
Conversation
- Route gpt-5.3-codex requests through the Responses API since it doesn't work with chat/completions - Add model(level) suffix parsing for reasoning effort control (e.g. gpt-5.3-codex(high), claude-opus-4.6(medium)) - Add direct /v1/responses endpoint for passthrough access - Expand /v1/models to list level-suffixed variants - Pass through Claude thinking config when level is specified - Fix translateModelName to not clobber 4.6 model names
There was a problem hiding this comment.
Pull request overview
Adds transparent routing/translation to support Copilot’s Responses API (required for gpt-5.3-codex) while preserving a Chat Completions-compatible surface, and introduces model-suffix “reasoning effort” controls plus expanded model discovery.
Changes:
- Add
/v1/responses(and/responses) passthrough plus service client for the Responses API. - Route codex requests hitting
/v1/chat/completionsthrough Responses API with request/response translation (streaming + non-streaming paths). - Add
model(level)parsing, applyreasoning_effort/Claudethinkingmerging, and expand/v1/modelslist with level-suffixed variants.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/phase5-routing.test.ts | Adds unit tests for model-level parsing, payload normalization, and non-stream chat/responses translation + model list expansion. |
| src/services/copilot/create-responses.ts | Introduces Copilot Responses API client (streaming + non-streaming). |
| src/services/copilot/create-chat-completions.ts | Normalizes model(level) into base model + reasoning/thinking fields before upstream request. |
| src/server.ts | Registers new /responses and /v1/responses routes. |
| src/routes/responses/route.ts | Adds HTTP handler for /responses passthrough with SSE streaming support. |
| src/routes/models/route.ts | Expands model listing to include level-suffixed variants. |
| src/routes/messages/non-stream-translation.ts | Preserves Claude 4.6 model names and forwards thinking into OpenAI-shaped payload. |
| src/routes/chat-completions/responses-translation.ts | Implements chat<->responses translation utilities and a Responses-stream -> Chat-stream adapter. |
| src/routes/chat-completions/handler.ts | Adds codex routing: translate chat request -> Responses API -> translate response back (stream + non-stream). |
| src/lib/model-level.ts | Defines supported levels, per-model variants, and parsing/helpers. |
| README.md | Documents the new POST /v1/responses endpoint. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const parsedEvent = JSON.parse(rawEvent.data) as { | ||
| type?: string | ||
| delta?: string | ||
| } | ||
|
|
||
| if ( | ||
| parsedEvent.type === "response.output_text.delta" | ||
| && typeof parsedEvent.delta === "string" | ||
| ) { | ||
| const chunk: ChatCompletionChunk = { | ||
| id: completionId, | ||
| object: "chat.completion.chunk", | ||
| created, | ||
| model, | ||
| choices: [ | ||
| { | ||
| index: 0, | ||
| delta: { | ||
| ...(hasEmittedContent ? {} : { role: "assistant" }), | ||
| content: parsedEvent.delta, | ||
| }, | ||
| finish_reason: null, | ||
| logprobs: null, | ||
| }, | ||
| ], | ||
| } | ||
| hasEmittedContent = true | ||
| yield { data: JSON.stringify(chunk) } | ||
| } | ||
| } |
There was a problem hiding this comment.
translateResponsesStreamToChatStream currently ignores all Responses streaming event types except response.output_text.delta, so streamed tool/function call events (and any other deltas) will be dropped. This breaks the PR’s claim that tool calls are translated for streaming and can lead to clients never receiving tool_calls deltas / correct finish_reason. Extend the stream translator to handle function/tool-call related event types and emit the corresponding Chat Completions chunk deltas (including a final chunk with finish_reason: "tool_calls" when applicable).
| return { | ||
| role: message.role, | ||
| content, | ||
| } |
There was a problem hiding this comment.
translateMessage only forwards role and content and drops chat-completions fields like tool_call_id, tool_calls, and name. If a client includes prior assistant tool calls or sends tool results (role "tool" with tool_call_id) in the conversation history, that information will be lost when converting to the Responses API, which can break multi-turn tool workflows. Preserve these fields by mapping them into the Responses input schema (including associating tool results with the correct call id).
| return { | |
| role: message.role, | |
| content, | |
| } | |
| const translated: ResponseInputMessage = { | |
| role: message.role, | |
| content, | |
| ...(message as any).name ? { name: (message as any).name } : {}, | |
| ...(message as any).tool_call_id | |
| ? { tool_call_id: (message as any).tool_call_id } | |
| : {}, | |
| ...(Array.isArray((message as any).tool_calls) | |
| ? { tool_calls: (message as any).tool_calls } | |
| : {}), | |
| } as ResponseInputMessage | |
| return translated |
| const response = await fetch(`${copilotBaseUrl(state)}/v1/responses`, { | ||
| method: "POST", | ||
| headers: copilotHeaders(state), | ||
| body: JSON.stringify(payload), | ||
| }) |
There was a problem hiding this comment.
createResponses does not replicate the header logic used in createChatCompletions (vision header via copilotHeaders(state, true) and the X-Initiator agent/user classification). When codex requests are routed through Responses, vision inputs (input_image) and agent/tool conversations may behave differently or fail because the required headers aren’t set. Consider detecting vision/agent usage from payload.input and building headers consistently with the chat-completions path.
| export async function* translateResponsesStreamToChatStream( | ||
| responseStream: AsyncIterable<{ data?: string }>, | ||
| model: string, | ||
| ): AsyncGenerator<SSEMessage> { | ||
| const completionId = randomUUID() | ||
| const created = Math.floor(Date.now() / 1000) | ||
| let hasEmittedContent = false | ||
|
|
||
| for await (const rawEvent of responseStream) { | ||
| if (rawEvent.data === "[DONE]") { | ||
| const endChunk: ChatCompletionChunk = { | ||
| id: completionId, | ||
| object: "chat.completion.chunk", | ||
| created, | ||
| model, | ||
| choices: [ | ||
| { | ||
| index: 0, | ||
| delta: {}, | ||
| finish_reason: "stop", | ||
| logprobs: null, | ||
| }, | ||
| ], | ||
| } | ||
| yield { data: JSON.stringify(endChunk) } | ||
| yield { data: "[DONE]" } | ||
| return | ||
| } | ||
|
|
||
| if (!rawEvent.data) { | ||
| continue | ||
| } | ||
|
|
||
| const parsedEvent = JSON.parse(rawEvent.data) as { | ||
| type?: string | ||
| delta?: string | ||
| } | ||
|
|
||
| if ( | ||
| parsedEvent.type === "response.output_text.delta" | ||
| && typeof parsedEvent.delta === "string" | ||
| ) { | ||
| const chunk: ChatCompletionChunk = { | ||
| id: completionId, | ||
| object: "chat.completion.chunk", | ||
| created, | ||
| model, | ||
| choices: [ | ||
| { | ||
| index: 0, | ||
| delta: { | ||
| ...(hasEmittedContent ? {} : { role: "assistant" }), | ||
| content: parsedEvent.delta, | ||
| }, | ||
| finish_reason: null, | ||
| logprobs: null, | ||
| }, | ||
| ], | ||
| } | ||
| hasEmittedContent = true | ||
| yield { data: JSON.stringify(chunk) } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
There are tests for non-streaming chat<->responses translation, but the streaming translator translateResponsesStreamToChatStream is untested. Add unit tests that feed representative Responses SSE events (text deltas, function/tool-call deltas, and termination) and assert the emitted Chat Completions chunks (including role emission, tool_calls, and correct finish_reason).
|
@Godzilla675 The signature for the encrypted_content is not being returned to the request object, which may affect model performance. Additionally, OpenAI is planning to deprecate the chat completion API. You can use this instead: https://github.com/caozhiyuan/copilot-api/, which supports GPT-related models in the message-api. |
|
Currently, on GitHub Copilot, the Claude model uses the native message API, the GPT model uses the responses API, and only Gemini uses the OpenAI-compatible interface (non-standard). The OpenAI-compatible interface does not actually support transmitting thought signatures. @Godzilla675 |
|
oh ok thanks for the review ill fix the issues. |
…uting - Add /v1/responses endpoint proxying OpenAI Responses API format - Add model(level) syntax for reasoning effort control (e.g. gpt-5.3-codex(high)) - Route gpt-5.3-codex requests via /v1/chat/completions through Responses API - Expand GET /v1/models to include level-suffixed model variants - Fix claude-sonnet-4.6 / claude-opus-4.6 model name preservation in non-stream translation
|
I will add gpt 5.4 support ok? |
Broaden GPT-family internal routing to the Responses path, add GPT-5.4 support, preserve chat-to-responses metadata, improve streaming tool call translation, align Responses request headers, and add regression coverage.
This adds support for the Responses API and reasoning effort control for models that support it.
Why
gpt-5.3-codex only works through the Responses API (
/v1/responses), not through/v1/chat/completions. This means any client using the standard OpenAI chat completions format cant use codex models at all. This PR fixes that by transparently converting chat completions requests to Responses API format when a codex model is requested, so clients dont need to change anything.What changed
Responses API routing
When a request comes in to
/v1/chat/completionswith a codex model, the proxy now:This works for both streaming and non-streaming requests. Tool calls are also translated in both directions.
A direct
/v1/responsespassthrough endpoint is also available if you want to use the Responses API format directly without any translation.Reasoning effort via model name suffix
You can now control reasoning effort by appending a level in parentheses to the model name:
Using the plain model name without a suffix works the same as before, no reasoning effort is set.
Claude thinking/extended thinking support
Claude models also support the level suffix, which enables extended thinking:
When you use a level suffix on a Claude model, the proxy sets both
reasoning_effortand thethinkingconfiguration that Claude expects. If you already have athinkingobject in your payload (e.g. withbudget_tokens), the proxy merges the level into it rather than overwriting it.Expanded model list
/v1/modelsnow includes all the level-suffixed variants so clients that read the model list can discover them:gpt-5.3-codex,gpt-5.3-codex(low),gpt-5.3-codex(medium),gpt-5.3-codex(high),gpt-5.3-codex(xhigh)claude-opus-4.6,claude-opus-4.6(low),claude-opus-4.6(medium),claude-opus-4.6(high)claude-opus-4.6-fast,claude-opus-4.6-fast(low),claude-opus-4.6-fast(medium),claude-opus-4.6-fast(high)claude-sonnet-4.6,claude-sonnet-4.6(low),claude-sonnet-4.6(medium),claude-sonnet-4.6(high)Other fixes
translateModelNamein the Anthropic handler was incorrectly rewritingclaude-sonnet-4.6andclaude-opus-4.6to their non-versioned names. Fixed so 4.6 model names are preserved.Testing
Tested against the live Copilot API with all 13 model+level combinations. All returned HTTP 200 with correct responses.