Add Responses API support and model-level routing by Godzilla675 · Pull Request #205 · ericc-ch/copilot-api

Godzilla675 · 2026-03-04T02:56:54Z

This adds support for the Responses API and reasoning effort control for models that support it.

Why

gpt-5.3-codex only works through the Responses API (/v1/responses), not through /v1/chat/completions. This means any client using the standard OpenAI chat completions format cant use codex models at all. This PR fixes that by transparently converting chat completions requests to Responses API format when a codex model is requested, so clients dont need to change anything.

What changed

Responses API routing

When a request comes in to /v1/chat/completions with a codex model, the proxy now:

Converts the chat completions payload to Responses API format (messages -> input, max_tokens -> max_output_tokens, content part types, etc.)
Sends it to the Copilot Responses API endpoint
Converts the response back to chat completions format before returning it to the client

This works for both streaming and non-streaming requests. Tool calls are also translated in both directions.

A direct /v1/responses passthrough endpoint is also available if you want to use the Responses API format directly without any translation.

Reasoning effort via model name suffix

You can now control reasoning effort by appending a level in parentheses to the model name:

gpt-5.3-codex(high)      -> gpt-5.3-codex with reasoning_effort: "high"
gpt-5.3-codex(xhigh)     -> gpt-5.3-codex with reasoning_effort: "xhigh"
gpt-5.3-codex(low)       -> gpt-5.3-codex with reasoning_effort: "low"
gpt-5.3-codex(medium)    -> gpt-5.3-codex with reasoning_effort: "medium"

Using the plain model name without a suffix works the same as before, no reasoning effort is set.

Claude thinking/extended thinking support

Claude models also support the level suffix, which enables extended thinking:

claude-opus-4.6(high)        -> sets reasoning_effort and thinking: { type: "enabled", effort: "high" }
claude-opus-4.6-fast(medium) -> same pattern
claude-sonnet-4.6(low)       -> same pattern

When you use a level suffix on a Claude model, the proxy sets both reasoning_effort and the thinking configuration that Claude expects. If you already have a thinking object in your payload (e.g. with budget_tokens), the proxy merges the level into it rather than overwriting it.

Expanded model list

/v1/models now includes all the level-suffixed variants so clients that read the model list can discover them:

gpt-5.3-codex, gpt-5.3-codex(low), gpt-5.3-codex(medium), gpt-5.3-codex(high), gpt-5.3-codex(xhigh)
claude-opus-4.6, claude-opus-4.6(low), claude-opus-4.6(medium), claude-opus-4.6(high)
claude-opus-4.6-fast, claude-opus-4.6-fast(low), claude-opus-4.6-fast(medium), claude-opus-4.6-fast(high)
claude-sonnet-4.6, claude-sonnet-4.6(low), claude-sonnet-4.6(medium), claude-sonnet-4.6(high)

Other fixes

translateModelName in the Anthropic handler was incorrectly rewriting claude-sonnet-4.6 and claude-opus-4.6 to their non-versioned names. Fixed so 4.6 model names are preserved.

Testing

Tested against the live Copilot API with all 13 model+level combinations. All returned HTTP 200 with correct responses.

- Route gpt-5.3-codex requests through the Responses API since it doesn't work with chat/completions - Add model(level) suffix parsing for reasoning effort control (e.g. gpt-5.3-codex(high), claude-opus-4.6(medium)) - Add direct /v1/responses endpoint for passthrough access - Expand /v1/models to list level-suffixed variants - Pass through Claude thinking config when level is specified - Fix translateModelName to not clobber 4.6 model names

Copilot

Pull request overview

Adds transparent routing/translation to support Copilot’s Responses API (required for gpt-5.3-codex) while preserving a Chat Completions-compatible surface, and introduces model-suffix “reasoning effort” controls plus expanded model discovery.

Changes:

Add /v1/responses (and /responses) passthrough plus service client for the Responses API.
Route codex requests hitting /v1/chat/completions through Responses API with request/response translation (streaming + non-streaming paths).
Add model(level) parsing, apply reasoning_effort/Claude thinking merging, and expand /v1/models list with level-suffixed variants.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/phase5-routing.test.ts	Adds unit tests for model-level parsing, payload normalization, and non-stream chat/responses translation + model list expansion.
src/services/copilot/create-responses.ts	Introduces Copilot Responses API client (streaming + non-streaming).
src/services/copilot/create-chat-completions.ts	Normalizes `model(level)` into base model + reasoning/thinking fields before upstream request.
src/server.ts	Registers new `/responses` and `/v1/responses` routes.
src/routes/responses/route.ts	Adds HTTP handler for `/responses` passthrough with SSE streaming support.
src/routes/models/route.ts	Expands model listing to include level-suffixed variants.
src/routes/messages/non-stream-translation.ts	Preserves Claude 4.6 model names and forwards `thinking` into OpenAI-shaped payload.
src/routes/chat-completions/responses-translation.ts	Implements chat<->responses translation utilities and a Responses-stream -> Chat-stream adapter.
src/routes/chat-completions/handler.ts	Adds codex routing: translate chat request -> Responses API -> translate response back (stream + non-stream).
src/lib/model-level.ts	Defines supported levels, per-model variants, and parsing/helpers.
README.md	Documents the new `POST /v1/responses` endpoint.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-04T03:04:11Z

src/routes/chat-completions/responses-translation.ts

+    const parsedEvent = JSON.parse(rawEvent.data) as {
+      type?: string
+      delta?: string
+    }
+
+    if (
+      parsedEvent.type === "response.output_text.delta"
+      && typeof parsedEvent.delta === "string"
+    ) {
+      const chunk: ChatCompletionChunk = {
+        id: completionId,
+        object: "chat.completion.chunk",
+        created,
+        model,
+        choices: [
+          {
+            index: 0,
+            delta: {
+              ...(hasEmittedContent ? {} : { role: "assistant" }),
+              content: parsedEvent.delta,
+            },
+            finish_reason: null,
+            logprobs: null,
+          },
+        ],
+      }
+      hasEmittedContent = true
+      yield { data: JSON.stringify(chunk) }
+    }
+  }


translateResponsesStreamToChatStream currently ignores all Responses streaming event types except response.output_text.delta, so streamed tool/function call events (and any other deltas) will be dropped. This breaks the PR’s claim that tool calls are translated for streaming and can lead to clients never receiving tool_calls deltas / correct finish_reason. Extend the stream translator to handle function/tool-call related event types and emit the corresponding Chat Completions chunk deltas (including a final chunk with finish_reason: "tool_calls" when applicable).

Copilot · 2026-03-04T03:04:12Z

src/routes/chat-completions/responses-translation.ts

+  return {
+    role: message.role,
+    content,
+  }


translateMessage only forwards role and content and drops chat-completions fields like tool_call_id, tool_calls, and name. If a client includes prior assistant tool calls or sends tool results (role "tool" with tool_call_id) in the conversation history, that information will be lost when converting to the Responses API, which can break multi-turn tool workflows. Preserve these fields by mapping them into the Responses input schema (including associating tool results with the correct call id).

Suggested change

return {

role: message.role,

content,

}

const translated: ResponseInputMessage = {

role: message.role,

content,

...(message as any).name ? { name: (message as any).name } : {},

...(message as any).tool_call_id

? { tool_call_id: (message as any).tool_call_id }

: {},

...(Array.isArray((message as any).tool_calls)

? { tool_calls: (message as any).tool_calls }

: {}),

} as ResponseInputMessage

return translated

Copilot · 2026-03-04T03:04:12Z

src/services/copilot/create-responses.ts

+  const response = await fetch(`${copilotBaseUrl(state)}/v1/responses`, {
+    method: "POST",
+    headers: copilotHeaders(state),
+    body: JSON.stringify(payload),
+  })


createResponses does not replicate the header logic used in createChatCompletions (vision header via copilotHeaders(state, true) and the X-Initiator agent/user classification). When codex requests are routed through Responses, vision inputs (input_image) and agent/tool conversations may behave differently or fail because the required headers aren’t set. Consider detecting vision/agent usage from payload.input and building headers consistently with the chat-completions path.

Copilot · 2026-03-04T03:04:12Z

src/routes/chat-completions/responses-translation.ts

+export async function* translateResponsesStreamToChatStream(
+  responseStream: AsyncIterable<{ data?: string }>,
+  model: string,
+): AsyncGenerator<SSEMessage> {
+  const completionId = randomUUID()
+  const created = Math.floor(Date.now() / 1000)
+  let hasEmittedContent = false
+
+  for await (const rawEvent of responseStream) {
+    if (rawEvent.data === "[DONE]") {
+      const endChunk: ChatCompletionChunk = {
+        id: completionId,
+        object: "chat.completion.chunk",
+        created,
+        model,
+        choices: [
+          {
+            index: 0,
+            delta: {},
+            finish_reason: "stop",
+            logprobs: null,
+          },
+        ],
+      }
+      yield { data: JSON.stringify(endChunk) }
+      yield { data: "[DONE]" }
+      return
+    }
+
+    if (!rawEvent.data) {
+      continue
+    }
+
+    const parsedEvent = JSON.parse(rawEvent.data) as {
+      type?: string
+      delta?: string
+    }
+
+    if (
+      parsedEvent.type === "response.output_text.delta"
+      && typeof parsedEvent.delta === "string"
+    ) {
+      const chunk: ChatCompletionChunk = {
+        id: completionId,
+        object: "chat.completion.chunk",
+        created,
+        model,
+        choices: [
+          {
+            index: 0,
+            delta: {
+              ...(hasEmittedContent ? {} : { role: "assistant" }),
+              content: parsedEvent.delta,
+            },
+            finish_reason: null,
+            logprobs: null,
+          },
+        ],
+      }
+      hasEmittedContent = true
+      yield { data: JSON.stringify(chunk) }
+    }
+  }
+}


There are tests for non-streaming chat<->responses translation, but the streaming translator translateResponsesStreamToChatStream is untested. Add unit tests that feed representative Responses SSE events (text deltas, function/tool-call deltas, and termination) and assert the emitted Chat Completions chunks (including role emission, tool_calls, and correct finish_reason).

caozhiyuan · 2026-03-04T03:31:33Z

@Godzilla675 The signature for the encrypted_content is not being returned to the request object, which may affect model performance. Additionally, OpenAI is planning to deprecate the chat completion API. You can use this instead: https://github.com/caozhiyuan/copilot-api/, which supports GPT-related models in the message-api.

caozhiyuan · 2026-03-04T03:33:17Z

Currently, on GitHub Copilot, the Claude model uses the native message API, the GPT model uses the responses API, and only Gemini uses the OpenAI-compatible interface (non-standard). The OpenAI-compatible interface does not actually support transmitting thought signatures. @Godzilla675

Godzilla675 · 2026-03-04T17:04:52Z

oh ok thanks for the review ill fix the issues.

…uting - Add /v1/responses endpoint proxying OpenAI Responses API format - Add model(level) syntax for reasoning effort control (e.g. gpt-5.3-codex(high)) - Route gpt-5.3-codex requests via /v1/chat/completions through Responses API - Expand GET /v1/models to include level-suffixed model variants - Fix claude-sonnet-4.6 / claude-opus-4.6 model name preservation in non-stream translation

Godzilla675 · 2026-03-13T14:18:11Z

I will add gpt 5.4 support ok?

Broaden GPT-family internal routing to the Responses path, add GPT-5.4 support, preserve chat-to-responses metadata, improve streaming tool call translation, align Responses request headers, and add regression coverage.

Copilot AI review requested due to automatic review settings March 4, 2026 02:56

Copilot started reviewing on behalf of Godzilla675 March 4, 2026 02:57 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

Godzilla675 closed this Mar 13, 2026

Godzilla675 reopened this Mar 13, 2026

Route GPT models through responses API

6ecf29a

Broaden GPT-family internal routing to the Responses path, add GPT-5.4 support, preserve chat-to-responses metadata, improve streaming tool call translation, align Responses request headers, and add regression coverage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Responses API support and model-level routing#205

Add Responses API support and model-level routing#205
Godzilla675 wants to merge 2 commits intoericc-ch:masterfrom
Godzilla675:responses-api-support

Godzilla675 commented Mar 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

caozhiyuan commented Mar 4, 2026

Uh oh!

caozhiyuan commented Mar 4, 2026 •

edited

Loading

Uh oh!

Godzilla675 commented Mar 4, 2026

Uh oh!

Godzilla675 commented Mar 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-  return {
-    role: message.role,
-    content,
-  }
+  const translated: ResponseInputMessage = {
+    role: message.role,
+    content,
+    ...(message as any).name ? { name: (message as any).name } : {},
+    ...(message as any).tool_call_id
+      ? { tool_call_id: (message as any).tool_call_id }
+      : {},
+    ...(Array.isArray((message as any).tool_calls)
+      ? { tool_calls: (message as any).tool_calls }
+      : {}),
+  } as ResponseInputMessage
+  return translated

Uh oh!

Conversation

Godzilla675 commented Mar 4, 2026

Why

What changed

Responses API routing

Reasoning effort via model name suffix

Claude thinking/extended thinking support

Expanded model list

Other fixes

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

caozhiyuan commented Mar 4, 2026

Uh oh!

caozhiyuan commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Godzilla675 commented Mar 4, 2026

Uh oh!

Godzilla675 commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

caozhiyuan commented Mar 4, 2026 •

edited

Loading

Godzilla675 commented Mar 13, 2026 •

edited

Loading