You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The OpenAI Responses API frontend (/v1/responses) provides full compatibility with the OpenAI Responses API specification. This newer API is designed for structured output generation with JSON schema validation, multi-turn conversations, and enhanced reasoning model support.
Endpoints
Method
Path
Description
POST
/v1/responses
Create a structured response (HTTP)
WebSocket
/v1/responses
WebSocket endpoint for low-latency connections
WebSocket Support: The Responses API supports WebSocket transport for persistent, low-latency connections. See WebSocket Transport Guide for details.
Supported Request Parameters
Core Parameters
Parameter
Type
Required
Description
model
string
Yes
Model identifier
input
string/array
No
Text, image, or file inputs to the model
messages
array
No
Conversation messages (alternative to input)
instructions
string
No
System/developer message for model context
response_format
object
No
Structured response format specification
Output Control
Parameter
Type
Description
max_tokens
integer
Maximum tokens to generate (deprecated)
max_output_tokens
integer
Upper bound for output tokens including reasoning
temperature
number
Sampling temperature (0.0-2.0)
top_p
number
Nucleus sampling parameter (0.0-1.0)
top_logprobs
integer
Number of most likely tokens to return (0-20)
n
integer
Number of completions to generate
stop
string/array
Stop sequences
presence_penalty
number
Presence penalty (-2.0 to 2.0)
frequency_penalty
number
Frequency penalty (-2.0 to 2.0)
logit_bias
object
Logit bias adjustments
Tool Calling
Parameter
Type
Description
tools
array
Array of tool definitions
tool_choice
string/object
Tool selection: none, auto, required, or specific tool
parallel_tool_calls
boolean
Allow model to run tool calls in parallel
max_tool_calls
integer
Maximum number of built-in tool calls per response
Reasoning Models (o1, o3, gpt-5)
Parameter
Type
Description
reasoning
object
Configuration for reasoning models
reasoning.effort
string
Reasoning effort: minimal, low, medium, high
reasoning.summary
string
Summary mode: auto, concise, detailed
Multi-turn Conversation
Parameter
Type
Description
conversation
string/object
Conversation this response belongs to
previous_response_id
string
Link to previous response for context
truncation
string
Truncation strategy for long contexts
Text/Format Configuration
Parameter
Type
Description
text
object
Text response configuration
text.format
object
Format specification (type: text, json_schema, etc.)
text.verbosity
string
Verbosity level: low, medium, high
Streaming & Processing
Parameter
Type
Description
stream
boolean
Whether to stream the response
stream_options
object
Options for streaming responses
background
boolean
Run model response in background
Caching & Optimization
Parameter
Type
Description
prompt_cache_key
string
Cache key for prompt caching optimization
prompt_cache_retention
string
Cache retention policy (e.g., 24h)
prompt
object
Reference to a prompt template
seed
integer
Random seed for reproducibility
Metadata & Tracking
Parameter
Type
Description
metadata
object
Key-value pairs attached to the request (max 16)
safety_identifier
string
Stable identifier for safety tracking
user
string
User identifier (deprecated, use safety_identifier)
curl -X POST http://localhost:8000/v1/responses \
-H "Content-Type: application/json" \
-d '{ "model": "gpt-4o", "input": "What is 2+2?", "instructions": "Respond with just the number" }'
Backend Translation
Requests to /v1/responses are automatically translated to the appropriate format for any configured backend:
Target Backend
Translation
OpenAI
Native passthrough or Chat Completions translation
Anthropic
Full translation with tool support
Gemini
Full translation with multimodal support
OpenRouter
OpenAI-compatible translation
Other backends
Chat Completions format translation
The input field is converted to messages format, and instructions becomes a system message when routing to backends that don't natively support the Responses API.