OpenAI Responses API Frontend

The OpenAI Responses API frontend (/v1/responses) provides full compatibility with the OpenAI Responses API specification. This newer API is designed for structured output generation with JSON schema validation, multi-turn conversations, and enhanced reasoning model support.

Endpoints

Method	Path	Description
POST	`/v1/responses`	Create a structured response (HTTP)
WebSocket	`/v1/responses`	WebSocket endpoint for low-latency connections

WebSocket Support: The Responses API supports WebSocket transport for persistent, low-latency connections. See WebSocket Transport Guide for details.

Supported Request Parameters

Core Parameters

Parameter	Type	Required	Description
`model`	string	Yes	Model identifier
`input`	string/array	No	Text, image, or file inputs to the model
`messages`	array	No	Conversation messages (alternative to `input`)
`instructions`	string	No	System/developer message for model context
`response_format`	object	No	Structured response format specification

Output Control

Parameter	Type	Description
`max_tokens`	integer	Maximum tokens to generate (deprecated)
`max_output_tokens`	integer	Upper bound for output tokens including reasoning
`temperature`	number	Sampling temperature (0.0-2.0)
`top_p`	number	Nucleus sampling parameter (0.0-1.0)
`top_logprobs`	integer	Number of most likely tokens to return (0-20)
`n`	integer	Number of completions to generate
`stop`	string/array	Stop sequences
`presence_penalty`	number	Presence penalty (-2.0 to 2.0)
`frequency_penalty`	number	Frequency penalty (-2.0 to 2.0)
`logit_bias`	object	Logit bias adjustments

Tool Calling

Parameter	Type	Description
`tools`	array	Array of tool definitions
`tool_choice`	string/object	Tool selection: `none`, `auto`, `required`, or specific tool
`parallel_tool_calls`	boolean	Allow model to run tool calls in parallel
`max_tool_calls`	integer	Maximum number of built-in tool calls per response

Reasoning Models (o1, o3, gpt-5)

Parameter	Type	Description
`reasoning`	object	Configuration for reasoning models
`reasoning.effort`	string	Reasoning effort: `minimal`, `low`, `medium`, `high`
`reasoning.summary`	string	Summary mode: `auto`, `concise`, `detailed`

Multi-turn Conversation

Parameter	Type	Description
`conversation`	string/object	Conversation this response belongs to
`previous_response_id`	string	Link to previous response for context
`truncation`	string	Truncation strategy for long contexts

Text/Format Configuration

Parameter	Type	Description
`text`	object	Text response configuration
`text.format`	object	Format specification (type: text, json_schema, etc.)
`text.verbosity`	string	Verbosity level: `low`, `medium`, `high`

Streaming & Processing

Parameter	Type	Description
`stream`	boolean	Whether to stream the response
`stream_options`	object	Options for streaming responses
`background`	boolean	Run model response in background

Caching & Optimization

Parameter	Type	Description
`prompt_cache_key`	string	Cache key for prompt caching optimization
`prompt_cache_retention`	string	Cache retention policy (e.g., `24h`)
`prompt`	object	Reference to a prompt template
`seed`	integer	Random seed for reproducibility

Metadata & Tracking

Parameter	Type	Description
`metadata`	object	Key-value pairs attached to the request (max 16)
`safety_identifier`	string	Stable identifier for safety tracking
`user`	string	User identifier (deprecated, use `safety_identifier`)
`include`	array	Additional output data to include in response
`store`	boolean	Store response for later retrieval/distillation
`service_tier`	string	Service tier: `auto`, `default`, `flex`

Proxy-Specific Parameters

Parameter	Type	Description
`session_id`	string	Session identifier for proxy session management
`agent`	string	Agent identifier
`extra_body`	object	Additional parameters passed through to backend

Response Format

Field	Type	Description
`id`	string	Unique response identifier
`object`	string	Always `"response"`
`created`	integer	Unix timestamp of creation
`model`	string	Model used for generation
`choices`	array	Array of completion choices (legacy format)
`output`	array	Responses API formatted output items
`output_text`	array	Aggregated text output (if available)
`usage`	object	Token usage statistics
`service_tier`	string	Actual service tier used
`system_fingerprint`	string	System fingerprint for reproducibility

Output Item Format

{
  "id": "msg-resp-123-0",
  "type": "message",
  "role": "assistant",
  "status": "completed",
  "content": [
    {"type": "output_text", "text": "Response text..."}
  ]
}

JSON Schema Validation

The frontend enforces JSON schema limits for safety and performance:

Limit	Value
Maximum schema depth	64 levels
Maximum schema nodes	4096 nodes
Maximum collection items	256 per array/object
Maximum string length	65536 characters

Invalid schemas return HTTP 400 with detailed error information.

Example Usage

Basic Structured Output

curl -X POST http://localhost:8000/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_PROXY_KEY" \
  -d '{
    "model": "gpt-4o",
    "input": "Generate a person profile for a software engineer",
    "instructions": "You are a helpful assistant that generates structured data.",
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "person_profile",
        "description": "A person profile",
        "schema": {
          "type": "object",
          "properties": {
            "name": {"type": "string"},
            "age": {"type": "integer"},
            "occupation": {"type": "string"},
            "skills": {
              "type": "array",
              "items": {"type": "string"}
            }
          },
          "required": ["name", "age", "occupation"]
        },
        "strict": true
      }
    },
    "max_output_tokens": 500,
    "temperature": 0.7
  }'

Streaming Response

curl -X POST http://localhost:8000/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_PROXY_KEY" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Write a story"}],
    "stream": true
  }'

With Reasoning Configuration

curl -X POST http://localhost:8000/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_PROXY_KEY" \
  -d '{
    "model": "o1-preview",
    "input": "Solve this complex mathematical problem...",
    "reasoning": {
      "effort": "high",
      "summary": "detailed"
    },
    "max_output_tokens": 4000
  }'

Using Input Field (Alternative to Messages)

curl -X POST http://localhost:8000/v1/responses \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "What is 2+2?",
    "instructions": "Respond with just the number"
  }'

Backend Translation

Requests to /v1/responses are automatically translated to the appropriate format for any configured backend:

Target Backend	Translation
OpenAI	Native passthrough or Chat Completions translation
Anthropic	Full translation with tool support
Gemini	Full translation with multimodal support
OpenRouter	OpenAI-compatible translation
Other backends	Chat Completions format translation

The input field is converted to messages format, and instructions becomes a system message when routing to backends that don't natively support the Responses API.

Differences from Chat Completions

Feature	Chat Completions	Responses API
Input format	`messages` array only	`input` field or `messages`
System message	In `messages` array	`instructions` field
Output format	`choices[].message`	`output[]` with typed content
Structured output	`response_format`	Enhanced `response_format` with strict validation
Reasoning config	`reasoning_effort` only	Full `reasoning` object
Conversation state	Manual	`conversation`, `previous_response_id`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenAI Responses API Frontend

Endpoints

Supported Request Parameters

Core Parameters

Output Control

Tool Calling

Reasoning Models (o1, o3, gpt-5)

Multi-turn Conversation

Text/Format Configuration

Streaming & Processing

Caching & Optimization

Metadata & Tracking

Proxy-Specific Parameters

Response Format

Output Item Format

JSON Schema Validation

Example Usage

Basic Structured Output

Streaming Response

With Reasoning Configuration

Using Input Field (Alternative to Messages)

Backend Translation

Differences from Chat Completions

Related Documentation

FilesExpand file tree

openai-responses.md

Latest commit

History

openai-responses.md

File metadata and controls

OpenAI Responses API Frontend

Endpoints

Supported Request Parameters

Core Parameters

Output Control

Tool Calling

Reasoning Models (o1, o3, gpt-5)

Multi-turn Conversation

Text/Format Configuration

Streaming & Processing

Caching & Optimization

Metadata & Tracking

Proxy-Specific Parameters

Response Format

Output Item Format

JSON Schema Validation

Example Usage

Basic Structured Output

Streaming Response

With Reasoning Configuration

Using Input Field (Alternative to Messages)

Backend Translation

Differences from Chat Completions

Related Documentation