Skip to content

Add Responses API support and model-level routing#205

Open
Godzilla675 wants to merge 2 commits intoericc-ch:masterfrom
Godzilla675:responses-api-support
Open

Add Responses API support and model-level routing#205
Godzilla675 wants to merge 2 commits intoericc-ch:masterfrom
Godzilla675:responses-api-support

Conversation

@Godzilla675
Copy link

This adds support for the Responses API and reasoning effort control for models that support it.

Why

gpt-5.3-codex only works through the Responses API (/v1/responses), not through /v1/chat/completions. This means any client using the standard OpenAI chat completions format cant use codex models at all. This PR fixes that by transparently converting chat completions requests to Responses API format when a codex model is requested, so clients dont need to change anything.

What changed

Responses API routing

When a request comes in to /v1/chat/completions with a codex model, the proxy now:

  1. Converts the chat completions payload to Responses API format (messages -> input, max_tokens -> max_output_tokens, content part types, etc.)
  2. Sends it to the Copilot Responses API endpoint
  3. Converts the response back to chat completions format before returning it to the client

This works for both streaming and non-streaming requests. Tool calls are also translated in both directions.

A direct /v1/responses passthrough endpoint is also available if you want to use the Responses API format directly without any translation.

Reasoning effort via model name suffix

You can now control reasoning effort by appending a level in parentheses to the model name:

gpt-5.3-codex(high)      -> gpt-5.3-codex with reasoning_effort: "high"
gpt-5.3-codex(xhigh)     -> gpt-5.3-codex with reasoning_effort: "xhigh"
gpt-5.3-codex(low)       -> gpt-5.3-codex with reasoning_effort: "low"
gpt-5.3-codex(medium)    -> gpt-5.3-codex with reasoning_effort: "medium"

Using the plain model name without a suffix works the same as before, no reasoning effort is set.

Claude thinking/extended thinking support

Claude models also support the level suffix, which enables extended thinking:

claude-opus-4.6(high)        -> sets reasoning_effort and thinking: { type: "enabled", effort: "high" }
claude-opus-4.6-fast(medium) -> same pattern
claude-sonnet-4.6(low)       -> same pattern

When you use a level suffix on a Claude model, the proxy sets both reasoning_effort and the thinking configuration that Claude expects. If you already have a thinking object in your payload (e.g. with budget_tokens), the proxy merges the level into it rather than overwriting it.

Expanded model list

/v1/models now includes all the level-suffixed variants so clients that read the model list can discover them:

  • gpt-5.3-codex, gpt-5.3-codex(low), gpt-5.3-codex(medium), gpt-5.3-codex(high), gpt-5.3-codex(xhigh)
  • claude-opus-4.6, claude-opus-4.6(low), claude-opus-4.6(medium), claude-opus-4.6(high)
  • claude-opus-4.6-fast, claude-opus-4.6-fast(low), claude-opus-4.6-fast(medium), claude-opus-4.6-fast(high)
  • claude-sonnet-4.6, claude-sonnet-4.6(low), claude-sonnet-4.6(medium), claude-sonnet-4.6(high)

Other fixes

  • translateModelName in the Anthropic handler was incorrectly rewriting claude-sonnet-4.6 and claude-opus-4.6 to their non-versioned names. Fixed so 4.6 model names are preserved.

Testing

Tested against the live Copilot API with all 13 model+level combinations. All returned HTTP 200 with correct responses.

- Route gpt-5.3-codex requests through the Responses API since it
  doesn't work with chat/completions
- Add model(level) suffix parsing for reasoning effort control
  (e.g. gpt-5.3-codex(high), claude-opus-4.6(medium))
- Add direct /v1/responses endpoint for passthrough access
- Expand /v1/models to list level-suffixed variants
- Pass through Claude thinking config when level is specified
- Fix translateModelName to not clobber 4.6 model names
Copilot AI review requested due to automatic review settings March 4, 2026 02:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds transparent routing/translation to support Copilot’s Responses API (required for gpt-5.3-codex) while preserving a Chat Completions-compatible surface, and introduces model-suffix “reasoning effort” controls plus expanded model discovery.

Changes:

  • Add /v1/responses (and /responses) passthrough plus service client for the Responses API.
  • Route codex requests hitting /v1/chat/completions through Responses API with request/response translation (streaming + non-streaming paths).
  • Add model(level) parsing, apply reasoning_effort/Claude thinking merging, and expand /v1/models list with level-suffixed variants.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/phase5-routing.test.ts Adds unit tests for model-level parsing, payload normalization, and non-stream chat/responses translation + model list expansion.
src/services/copilot/create-responses.ts Introduces Copilot Responses API client (streaming + non-streaming).
src/services/copilot/create-chat-completions.ts Normalizes model(level) into base model + reasoning/thinking fields before upstream request.
src/server.ts Registers new /responses and /v1/responses routes.
src/routes/responses/route.ts Adds HTTP handler for /responses passthrough with SSE streaming support.
src/routes/models/route.ts Expands model listing to include level-suffixed variants.
src/routes/messages/non-stream-translation.ts Preserves Claude 4.6 model names and forwards thinking into OpenAI-shaped payload.
src/routes/chat-completions/responses-translation.ts Implements chat<->responses translation utilities and a Responses-stream -> Chat-stream adapter.
src/routes/chat-completions/handler.ts Adds codex routing: translate chat request -> Responses API -> translate response back (stream + non-stream).
src/lib/model-level.ts Defines supported levels, per-model variants, and parsing/helpers.
README.md Documents the new POST /v1/responses endpoint.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +116 to +145
const parsedEvent = JSON.parse(rawEvent.data) as {
type?: string
delta?: string
}

if (
parsedEvent.type === "response.output_text.delta"
&& typeof parsedEvent.delta === "string"
) {
const chunk: ChatCompletionChunk = {
id: completionId,
object: "chat.completion.chunk",
created,
model,
choices: [
{
index: 0,
delta: {
...(hasEmittedContent ? {} : { role: "assistant" }),
content: parsedEvent.delta,
},
finish_reason: null,
logprobs: null,
},
],
}
hasEmittedContent = true
yield { data: JSON.stringify(chunk) }
}
}
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

translateResponsesStreamToChatStream currently ignores all Responses streaming event types except response.output_text.delta, so streamed tool/function call events (and any other deltas) will be dropped. This breaks the PR’s claim that tool calls are translated for streaming and can lead to clients never receiving tool_calls deltas / correct finish_reason. Extend the stream translator to handle function/tool-call related event types and emit the corresponding Chat Completions chunk deltas (including a final chunk with finish_reason: "tool_calls" when applicable).

Copilot uses AI. Check for mistakes.
Comment on lines +158 to +161
return {
role: message.role,
content,
}
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

translateMessage only forwards role and content and drops chat-completions fields like tool_call_id, tool_calls, and name. If a client includes prior assistant tool calls or sends tool results (role "tool" with tool_call_id) in the conversation history, that information will be lost when converting to the Responses API, which can break multi-turn tool workflows. Preserve these fields by mapping them into the Responses input schema (including associating tool results with the correct call id).

Suggested change
return {
role: message.role,
content,
}
const translated: ResponseInputMessage = {
role: message.role,
content,
...(message as any).name ? { name: (message as any).name } : {},
...(message as any).tool_call_id
? { tool_call_id: (message as any).tool_call_id }
: {},
...(Array.isArray((message as any).tool_calls)
? { tool_calls: (message as any).tool_calls }
: {}),
} as ResponseInputMessage
return translated

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +16
const response = await fetch(`${copilotBaseUrl(state)}/v1/responses`, {
method: "POST",
headers: copilotHeaders(state),
body: JSON.stringify(payload),
})
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createResponses does not replicate the header logic used in createChatCompletions (vision header via copilotHeaders(state, true) and the X-Initiator agent/user classification). When codex requests are routed through Responses, vision inputs (input_image) and agent/tool conversations may behave differently or fail because the required headers aren’t set. Consider detecting vision/agent usage from payload.input and building headers consistently with the chat-completions path.

Copilot uses AI. Check for mistakes.
Comment on lines +83 to +146
export async function* translateResponsesStreamToChatStream(
responseStream: AsyncIterable<{ data?: string }>,
model: string,
): AsyncGenerator<SSEMessage> {
const completionId = randomUUID()
const created = Math.floor(Date.now() / 1000)
let hasEmittedContent = false

for await (const rawEvent of responseStream) {
if (rawEvent.data === "[DONE]") {
const endChunk: ChatCompletionChunk = {
id: completionId,
object: "chat.completion.chunk",
created,
model,
choices: [
{
index: 0,
delta: {},
finish_reason: "stop",
logprobs: null,
},
],
}
yield { data: JSON.stringify(endChunk) }
yield { data: "[DONE]" }
return
}

if (!rawEvent.data) {
continue
}

const parsedEvent = JSON.parse(rawEvent.data) as {
type?: string
delta?: string
}

if (
parsedEvent.type === "response.output_text.delta"
&& typeof parsedEvent.delta === "string"
) {
const chunk: ChatCompletionChunk = {
id: completionId,
object: "chat.completion.chunk",
created,
model,
choices: [
{
index: 0,
delta: {
...(hasEmittedContent ? {} : { role: "assistant" }),
content: parsedEvent.delta,
},
finish_reason: null,
logprobs: null,
},
],
}
hasEmittedContent = true
yield { data: JSON.stringify(chunk) }
}
}
}
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are tests for non-streaming chat<->responses translation, but the streaming translator translateResponsesStreamToChatStream is untested. Add unit tests that feed representative Responses SSE events (text deltas, function/tool-call deltas, and termination) and assert the emitted Chat Completions chunks (including role emission, tool_calls, and correct finish_reason).

Copilot uses AI. Check for mistakes.
@caozhiyuan
Copy link
Contributor

@Godzilla675 The signature for the encrypted_content is not being returned to the request object, which may affect model performance. Additionally, OpenAI is planning to deprecate the chat completion API. You can use this instead: https://github.com/caozhiyuan/copilot-api/, which supports GPT-related models in the message-api.

@caozhiyuan
Copy link
Contributor

caozhiyuan commented Mar 4, 2026

Currently, on GitHub Copilot, the Claude model uses the native message API, the GPT model uses the responses API, and only Gemini uses the OpenAI-compatible interface (non-standard). The OpenAI-compatible interface does not actually support transmitting thought signatures. @Godzilla675

@Godzilla675
Copy link
Author

oh ok thanks for the review ill fix the issues.

jcaose added a commit to jcaose/copilot-api that referenced this pull request Mar 8, 2026
…uting

- Add /v1/responses endpoint proxying OpenAI Responses API format
- Add model(level) syntax for reasoning effort control (e.g. gpt-5.3-codex(high))
- Route gpt-5.3-codex requests via /v1/chat/completions through Responses API
- Expand GET /v1/models to include level-suffixed model variants
- Fix claude-sonnet-4.6 / claude-opus-4.6 model name preservation in non-stream translation
@Godzilla675
Copy link
Author

Godzilla675 commented Mar 13, 2026

I will add gpt 5.4 support ok?

@Godzilla675 Godzilla675 reopened this Mar 13, 2026
Broaden GPT-family internal routing to the Responses path, add GPT-5.4 support, preserve chat-to-responses metadata, improve streaming tool call translation, align Responses request headers, and add regression coverage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants