Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions docs/background.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Background mode

OpenAI's [Responses API background mode](https://platform.openai.com/docs/guides/background) lets long-running model calls survive client disconnects: the server keeps processing the request and you poll it to completion. This matters for reasoning-heavy single turns (`gpt-5.2-pro`, deep-research models) that can take minutes and otherwise fall foul of HTTP timeouts on Vercel, Cloudflare Workers, corporate proxies, etc.

The Agents SDK exposes background mode via two new fields on [`ModelSettings`][agents.model_settings.ModelSettings]:

- `background: bool | None` — opt in to background mode.
- `background_poll_interval_seconds: float | None` — optional fixed poll interval. When unset, the SDK honors the `openai-poll-after-ms` response header and falls back to 1.0 second.

## Transparent use through `Runner`

Set the flag on your agent's `ModelSettings` and run as usual. The SDK submits with `background=True`, polls `client.responses.retrieve(id)` adaptively, and returns the terminal response — `Runner.run` and `Runner.run_streamed` need no other changes.

```python
from agents import Agent, ModelSettings, Runner

agent = Agent(
name="reasoner",
model="gpt-5.2-pro",
model_settings=ModelSettings(background=True),
)
result = await Runner.run(agent, "Plan a multi-stage research workflow.")
print(result.final_output)
```

For streaming, `background=True` is passed through to `responses.create(stream=True, background=True)` so the server keeps generating across client disconnects. Client-side auto-resume via `starting_after` is intentionally not part of this MVP — plain `openai-python` doesn't auto-resume either.

```python
async for event in Runner.run_streamed(agent, "Stream me a long answer").stream_events():
print(event)
```

## Retrieving a response by id

If you captured a `response_id` and want to fetch the latest server state from a different process or worker, call `client.responses.retrieve(response_id)` on the underlying `AsyncOpenAI` client directly — there is no SDK-specific wrapper, deliberately, because that would only add API surface without adding capability.

```python
from openai import AsyncOpenAI

client = AsyncOpenAI()
response = await client.responses.retrieve(response_id)
print(response.status)
```

## Cancellation

If the surrounding task is cancelled (`asyncio.CancelledError`) while the SDK is polling, the SDK schedules a best-effort `client.responses.cancel(response_id)` so the in-flight server-side response is not leaked. The `CancelledError` then propagates to the caller as usual.

## Compatibility

Background mode is **supported only by the HTTP Responses transport** ([`OpenAIResponsesModel`][agents.models.openai_responses.OpenAIResponsesModel]). Setting `background=True` on either of these adapters raises [`UserError`][agents.exceptions.UserError] so the durability guarantee you opted into is not silently demoted:

- [`OpenAIResponsesWSModel`][agents.models.openai_responses.OpenAIResponsesWSModel] — the WebSocket transport always streams and cannot decouple submit from poll.
- [`OpenAIChatCompletionsModel`][agents.models.openai_chatcompletions.OpenAIChatCompletionsModel] — the Chat Completions API has no `background` parameter.

If you're on a non-OpenAI provider via LiteLLM / AnyLLM, the field is read on `ModelSettings` but not plumbed by those adapters; whether it does anything depends on the underlying provider.

## Limits

- Background responses are retained server-side for **about 10 minutes**.
- Background mode is **not ZDR-compatible**.
- The `Runner` does not impose its own deadline on a background poll. If you need a hard ceiling, wrap your call (e.g. `asyncio.wait_for(Runner.run(agent, ...), timeout=600)`); on timeout, the SDK's cancel-on-CancelledError logic still fires.
Empty file.
80 changes: 80 additions & 0 deletions examples/background_mode/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
"""Example demonstrating Responses API background mode.

When `ModelSettings(background=True)` is set, the SDK submits the underlying
`client.responses.create()` call with `background=True` and adaptively polls
`client.responses.retrieve(...)` until the response reaches a terminal state.
This lets long-running reasoning calls (gpt-5.2-pro, deep-research-class
workloads) survive HTTP / proxy / serverless timeouts that would otherwise
abort a synchronous call.

To run this example:

export OPENAI_API_KEY=...
python -m examples.background_mode.main

Compare the two runs below: with and without `background=True`. The output
should be equivalent, but only the background variant keeps the server-side
work alive across transient client-side disconnects.
"""

from __future__ import annotations

import asyncio
import os

from agents import Agent, ModelSettings, Runner

MODEL_NAME = os.getenv("BACKGROUND_MODEL_NAME") or "gpt-5.2-pro"
PROMPT = (
"Plan a three-stage research workflow for studying the long-term effects "
"of intermittent fasting on cognitive performance. For each stage, list "
"the primary research question, the methods, and one specific risk to "
"external validity."
)


async def run_synchronous() -> str:
agent = Agent(name="planner", model=MODEL_NAME)
print("\n=== Without background mode (synchronous) ===")
result = await Runner.run(agent, PROMPT)
return str(result.final_output)


async def run_background() -> str:
agent = Agent(
name="planner",
model=MODEL_NAME,
model_settings=ModelSettings(background=True),
)
print("\n=== With background mode (submit + adaptive poll) ===")
result = await Runner.run(agent, PROMPT)
return str(result.final_output)


async def main() -> None:
try:
sync_output = await run_synchronous()
print(sync_output)

bg_output = await run_background()
print(bg_output)

# The two transports should produce equivalent final output for the
# same prompt and seed. Background mode's win is durability, not
# different content.
if sync_output.strip() == bg_output.strip():
print("\nOutputs match.")
else:
print(
"\nOutputs differ — expected when sampling is non-deterministic, "
"but the background variant survived any transient disconnects."
)
except Exception as exc:
print(f"Error: {exc}")
print("\nNote: background mode is supported only by the Responses API")
print("HTTP transport. Set OPENAI_API_KEY and try a model that")
print("accepts long-running background requests (e.g. gpt-5.2-pro).")


if __name__ == "__main__":
asyncio.run(main())
4 changes: 4 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ plugins:
- Guardrails: guardrails.md
- Running agents: running_agents.md
- Streaming: streaming.md
- Background mode: background.md
- Agent orchestration: multi_agent.md
- Handoffs: handoffs.md
- Results: results.md
Expand Down Expand Up @@ -213,6 +214,7 @@ plugins:
- guardrails.md
- running_agents.md
- streaming.md
- background.md
- multi_agent.md
- handoffs.md
- results.md
Expand Down Expand Up @@ -256,6 +258,7 @@ plugins:
- guardrails.md
- running_agents.md
- streaming.md
- background.md
- multi_agent.md
- handoffs.md
- results.md
Expand Down Expand Up @@ -299,6 +302,7 @@ plugins:
- guardrails.md
- running_agents.md
- streaming.md
- background.md
- multi_agent.md
- handoffs.md
- results.md
Expand Down
24 changes: 24 additions & 0 deletions src/agents/model_settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ class MCPToolChoice:
"top_logprobs",
"retry",
"context_management",
"background",
)


Expand Down Expand Up @@ -191,6 +192,29 @@ class ModelSettings:
to enable server-side compaction when the rendered context crosses a token threshold.
"""

background: bool | None = None
"""Whether to run the model response in the background.

When ``True``, the SDK submits via ``client.responses.create(background=True)``
and polls ``client.responses.retrieve(...)`` until the response reaches a
terminal state. Background mode lets long single-turn calls (reasoning models,
deep-research workloads) survive HTTP / proxy / serverless timeouts.

Only supported by ``OpenAIResponsesModel`` (HTTP transport). Setting this on
``OpenAIResponsesWSModel`` or ``OpenAIChatCompletionsModel`` raises ``UserError``.
Background mode is not ZDR-compatible and response data is retained server-side
for ~10 minutes.
`Learn more <https://platform.openai.com/docs/guides/background>`_.
"""

background_poll_interval_seconds: float | None = None
"""Polling interval (seconds) when ``background=True``.

When unset, the SDK honors the ``openai-poll-after-ms`` response header from
the most recent ``retrieve()``; falls back to 1.0 second when the header is
absent. Ignored when ``background`` is not enabled.
"""

def resolve(self, override: ModelSettings | None) -> ModelSettings:
"""Produce a new ModelSettings by overlaying any non-None values from the
override on top of this instance."""
Expand Down
11 changes: 11 additions & 0 deletions src/agents/models/openai_chatcompletions.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,15 @@ def _non_null_or_omit(self, value: Any) -> Any:
def _supports_default_prompt_cache_key(self) -> bool:
return ChatCmplHelpers.is_openai(self._get_client())

@staticmethod
def _handle_unsupported_background(model_settings: ModelSettings) -> None:
if model_settings.background:
raise UserError(
"ModelSettings.background=True is not supported by "
"OpenAIChatCompletionsModel; the Chat Completions API has no "
"background-mode equivalent. Use OpenAIResponsesModel instead."
)

def _handle_unsupported_prompt(self, prompt: ResponsePromptParam | None) -> None:
if prompt is None:
return
Expand Down Expand Up @@ -140,6 +149,7 @@ async def get_response(
conversation_id: str | None = None,
prompt: ResponsePromptParam | None = None,
) -> ModelResponse:
self._handle_unsupported_background(model_settings)
self._handle_unsupported_server_managed_conversation_state(
previous_response_id=previous_response_id,
conversation_id=conversation_id,
Expand Down Expand Up @@ -274,6 +284,7 @@ async def stream_response(
"""
Yields a partial message as it is generated, as well as the usage information.
"""
self._handle_unsupported_background(model_settings)
self._handle_unsupported_server_managed_conversation_state(
previous_response_id=previous_response_id,
conversation_id=conversation_id,
Expand Down
119 changes: 118 additions & 1 deletion src/agents/models/openai_responses.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,32 @@
value for value in get_args(ResponseIncludable) if isinstance(value, str)
)

# Terminal `Response.status` values per the OpenAI Responses API. Mirrors the
# `ResponseStatus` literal type in `openai-python`. A response whose status is
# absent from this set (`queued` / `in_progress`) is still being generated and
# must be polled.
_RESPONSE_TERMINAL_STATUSES: frozenset[str] = frozenset(
{"completed", "failed", "cancelled", "incomplete"}
)

# Default polling interval when `background=True` and no explicit interval or
# server header is available. Matches the fallback used by openai-python's
# `create_and_poll` helpers.
_DEFAULT_BACKGROUND_POLL_INTERVAL_SECONDS = 1.0

# Server-sent hint header advising the next poll delay (in milliseconds). When
# the caller has not pinned an explicit `background_poll_interval_seconds`, we
# honor this header so the loop adapts to server backpressure.
_BACKGROUND_POLL_AFTER_HEADER = "openai-poll-after-ms"


def _is_response_terminal_status(status: str | None) -> bool:
"""True when `status` is a terminal value (or unset, which we treat as
terminal to avoid spinning on unexpected payloads)."""
if status is None:
return True
return status in _RESPONSE_TERMINAL_STATUSES


class _NamespaceToolParam(TypedDict):
type: Literal["namespace"]
Expand Down Expand Up @@ -444,6 +470,82 @@ def _consume_background_cleanup_task_result(task: asyncio.Task[Any]) -> None:
except Exception as exc:
logger.debug(f"Background stream cleanup failed after cancellation: {exc}")

def _schedule_background_response_cancel(self, client: AsyncOpenAI, response_id: str) -> None:
"""Best-effort fire-and-forget cancel of an in-flight background response.

Invoked when the poll loop is cancelled or hits a non-recoverable error
before reaching a terminal state, so that server-side compute is not
leaked. Failures from the cancel call itself are swallowed.
"""

async def _do_cancel() -> None:
try:
await client.responses.cancel(response_id)
except Exception as exc:
logger.debug(
f"Background response cancel for {response_id} failed (ignored): {exc}"
)

try:
task = asyncio.create_task(_do_cancel())
except RuntimeError:
# No running loop available (e.g. interpreter shutdown). Nothing we
# can do here; the server response will time out on its own.
return
task.add_done_callback(self._consume_background_cleanup_task_result)

async def _poll_background_response_until_terminal(
self,
*,
client: AsyncOpenAI,
response: Response,
poll_interval_seconds: float | None,
) -> Response:
"""Poll `responses.retrieve(id)` until the response reaches a terminal status.

When `poll_interval_seconds` is provided it pins the cadence; otherwise the
loop honors the `openai-poll-after-ms` response header and falls back to
``_DEFAULT_BACKGROUND_POLL_INTERVAL_SECONDS`` when no header is present.
Mirrors the adaptive-polling pattern used by `openai-python`'s
`create_and_poll` helpers.

On cancellation or unexpected error mid-poll, the in-flight server-side
response is cancelled best-effort via
`_schedule_background_response_cancel` so compute is not leaked.
Reaching a non-`completed` terminal state (`failed` / `cancelled` /
`incomplete`) raises `ModelBehaviorError`.
"""
response_id = response.id
explicit_interval = poll_interval_seconds
interval = (
explicit_interval
if explicit_interval is not None
else _DEFAULT_BACKGROUND_POLL_INTERVAL_SECONDS
)
try:
while not _is_response_terminal_status(response.status):
await asyncio.sleep(interval)
raw = await client.responses.with_raw_response.retrieve(response_id)
response = raw.parse()
if explicit_interval is None:
header_value = raw.headers.get(_BACKGROUND_POLL_AFTER_HEADER)
if header_value is not None:
try:
interval = float(header_value) / 1000.0
except (TypeError, ValueError):
# Server sent a malformed header; keep current interval.
pass
except BaseException:
self._schedule_background_response_cancel(client, response_id)
raise

if response.status != "completed":
# Non-`completed` terminal status; the server has already finished
# so we don't need to cancel. Raise a model-error so callers see a
# consistent failure type.
raise response_terminal_failure_error(f"response.{response.status}", response)
return response

async def get_response(
self,
system_instructions: str | None,
Expand Down Expand Up @@ -693,7 +795,14 @@ async def _fetch_response(

if not stream:
response = await client.responses.create(**create_kwargs)
return cast(Response, response)
response = cast(Response, response)
if model_settings.background and not _is_response_terminal_status(response.status):
response = await self._poll_background_response_until_terminal(
client=client,
response=response,
poll_interval_seconds=model_settings.background_poll_interval_seconds,
)
return response

streaming_response = getattr(client.responses, "with_streaming_response", None)
stream_create = getattr(streaming_response, "create", None)
Expand Down Expand Up @@ -849,6 +958,7 @@ def _build_response_create_kwargs(
"extra_body": model_settings.extra_body,
"text": response_format,
"store": self._non_null_or_omit(model_settings.store),
"background": self._non_null_or_omit(model_settings.background),
"prompt_cache_retention": self._non_null_or_omit(model_settings.prompt_cache_retention),
"reasoning": self._non_null_or_omit(model_settings.reasoning),
"metadata": self._non_null_or_omit(model_settings.metadata),
Expand Down Expand Up @@ -1082,6 +1192,13 @@ async def _fetch_response(
stream: Literal[True] | Literal[False] = False,
prompt: ResponsePromptParam | None = None,
) -> Response | AsyncIterator[ResponseStreamEvent]:
if model_settings.background:
raise UserError(
"ModelSettings.background=True is not supported by "
"OpenAIResponsesWSModel; the WebSocket transport always streams "
"and cannot decouple submit from poll. Use OpenAIResponsesModel "
"(HTTP transport) instead."
)
create_kwargs = self._build_response_create_kwargs(
system_instructions=system_instructions,
input=input,
Expand Down
Loading