openai · dtlics · May 20, 2026 · May 20, 2026 · May 20, 2026 · May 20, 2026
diff --git a/docs/background.md b/docs/background.md
@@ -0,0 +1,62 @@
+# Background mode
+
+OpenAI's [Responses API background mode](https://platform.openai.com/docs/guides/background) lets long-running model calls survive client disconnects: the server keeps processing the request and you poll it to completion. This matters for reasoning-heavy single turns (`gpt-5.2-pro`, deep-research models) that can take minutes and otherwise fall foul of HTTP timeouts on Vercel, Cloudflare Workers, corporate proxies, etc.
+
+The Agents SDK exposes background mode via two new fields on [`ModelSettings`][agents.model_settings.ModelSettings]:
+
+- `background: bool | None` — opt in to background mode.
+- `background_poll_interval_seconds: float | None` — optional fixed poll interval. When unset, the SDK honors the `openai-poll-after-ms` response header and falls back to 1.0 second.
+
+## Transparent use through `Runner`
+
+Set the flag on your agent's `ModelSettings` and run as usual. The SDK submits with `background=True`, polls `client.responses.retrieve(id)` adaptively, and returns the terminal response — `Runner.run` and `Runner.run_streamed` need no other changes.
+
+```python
+from agents import Agent, ModelSettings, Runner
+
+agent = Agent(
+    name="reasoner",
+    model="gpt-5.2-pro",
+    model_settings=ModelSettings(background=True),
+)
+result = await Runner.run(agent, "Plan a multi-stage research workflow.")
+print(result.final_output)
+```
+
+For streaming, `background=True` is passed through to `responses.create(stream=True, background=True)` so the server keeps generating across client disconnects. Client-side auto-resume via `starting_after` is intentionally not part of this MVP — plain `openai-python` doesn't auto-resume either.
+
+```python
+async for event in Runner.run_streamed(agent, "Stream me a long answer").stream_events():
+    print(event)
+```
+
+## Retrieving a response by id
+
+If you captured a `response_id` and want to fetch the latest server state from a different process or worker, call `client.responses.retrieve(response_id)` on the underlying `AsyncOpenAI` client directly — there is no SDK-specific wrapper, deliberately, because that would only add API surface without adding capability.
+
+```python
+from openai import AsyncOpenAI
+
+client = AsyncOpenAI()
+response = await client.responses.retrieve(response_id)
+print(response.status)
+```
+
+## Cancellation
+
+If the surrounding task is cancelled (`asyncio.CancelledError`) while the SDK is polling, the SDK schedules a best-effort `client.responses.cancel(response_id)` so the in-flight server-side response is not leaked. The `CancelledError` then propagates to the caller as usual.
+
+## Compatibility
+
+Background mode is **supported only by the HTTP Responses transport** ([`OpenAIResponsesModel`][agents.models.openai_responses.OpenAIResponsesModel]). Setting `background=True` on either of these adapters raises [`UserError`][agents.exceptions.UserError] so the durability guarantee you opted into is not silently demoted:
+
+- [`OpenAIResponsesWSModel`][agents.models.openai_responses.OpenAIResponsesWSModel] — the WebSocket transport always streams and cannot decouple submit from poll.
+- [`OpenAIChatCompletionsModel`][agents.models.openai_chatcompletions.OpenAIChatCompletionsModel] — the Chat Completions API has no `background` parameter.
+
+If you're on a non-OpenAI provider via LiteLLM / AnyLLM, the field is read on `ModelSettings` but not plumbed by those adapters; whether it does anything depends on the underlying provider.
+
+## Limits
+
+- Background responses are retained server-side for **about 10 minutes**.
+- Background mode is **not ZDR-compatible**.
+- The `Runner` does not impose its own deadline on a background poll. If you need a hard ceiling, wrap your call (e.g. `asyncio.wait_for(Runner.run(agent, ...), timeout=600)`); on timeout, the SDK's cancel-on-CancelledError logic still fires.
diff --git a/examples/background_mode/__init__.py b/examples/background_mode/__init__.py
diff --git a/examples/background_mode/main.py b/examples/background_mode/main.py
@@ -0,0 +1,80 @@
+"""Example demonstrating Responses API background mode.
+
+When `ModelSettings(background=True)` is set, the SDK submits the underlying
+`client.responses.create()` call with `background=True` and adaptively polls
+`client.responses.retrieve(...)` until the response reaches a terminal state.
+This lets long-running reasoning calls (gpt-5.2-pro, deep-research-class
+workloads) survive HTTP / proxy / serverless timeouts that would otherwise
+abort a synchronous call.
+
+To run this example:
+
+    export OPENAI_API_KEY=...
+    python -m examples.background_mode.main
+
+Compare the two runs below: with and without `background=True`. The output
+should be equivalent, but only the background variant keeps the server-side
+work alive across transient client-side disconnects.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import os
+
+from agents import Agent, ModelSettings, Runner
+
+MODEL_NAME = os.getenv("BACKGROUND_MODEL_NAME") or "gpt-5.2-pro"
+PROMPT = (
+    "Plan a three-stage research workflow for studying the long-term effects "
+    "of intermittent fasting on cognitive performance. For each stage, list "
+    "the primary research question, the methods, and one specific risk to "
+    "external validity."
+)
+
+
+async def run_synchronous() -> str:
+    agent = Agent(name="planner", model=MODEL_NAME)
+    print("\n=== Without background mode (synchronous) ===")
+    result = await Runner.run(agent, PROMPT)
+    return str(result.final_output)
+
+
+async def run_background() -> str:
+    agent = Agent(
+        name="planner",
+        model=MODEL_NAME,
+        model_settings=ModelSettings(background=True),
+    )
+    print("\n=== With background mode (submit + adaptive poll) ===")
+    result = await Runner.run(agent, PROMPT)
+    return str(result.final_output)
+
+
+async def main() -> None:
+    try:
+        sync_output = await run_synchronous()
+        print(sync_output)
+
+        bg_output = await run_background()
+        print(bg_output)
+
+        # The two transports should produce equivalent final output for the
+        # same prompt and seed. Background mode's win is durability, not
+        # different content.
+        if sync_output.strip() == bg_output.strip():
+            print("\nOutputs match.")
+        else:
+            print(
+                "\nOutputs differ — expected when sampling is non-deterministic, "
+                "but the background variant survived any transient disconnects."
+            )
+    except Exception as exc:
+        print(f"Error: {exc}")
+        print("\nNote: background mode is supported only by the Responses API")
+        print("HTTP transport. Set OPENAI_API_KEY and try a model that")
+        print("accepts long-running background requests (e.g. gpt-5.2-pro).")
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -64,6 +64,7 @@ plugins:
                 - Guardrails: guardrails.md
                 - Running agents: running_agents.md
                 - Streaming: streaming.md
+                - Background mode: background.md
                 - Agent orchestration: multi_agent.md
                 - Handoffs: handoffs.md
                 - Results: results.md
@@ -213,6 +214,7 @@ plugins:
                 - guardrails.md
                 - running_agents.md
                 - streaming.md
+                - background.md
                 - multi_agent.md
                 - handoffs.md
                 - results.md
@@ -256,6 +258,7 @@ plugins:
                 - guardrails.md
                 - running_agents.md
                 - streaming.md
+                - background.md
                 - multi_agent.md
                 - handoffs.md
                 - results.md
@@ -299,6 +302,7 @@ plugins:
                 - guardrails.md
                 - running_agents.md
                 - streaming.md
+                - background.md
                 - multi_agent.md
                 - handoffs.md
                 - results.md

diff --git a/src/agents/model_settings.py b/src/agents/model_settings.py
@@ -79,6 +79,7 @@ class MCPToolChoice:
     "top_logprobs",
     "retry",
     "context_management",
+    "background",
 )
 
 
@@ -191,6 +192,29 @@ class ModelSettings:
     to enable server-side compaction when the rendered context crosses a token threshold.
     """
 
+    background: bool | None = None
+    """Whether to run the model response in the background.
+
+    When ``True``, the SDK submits via ``client.responses.create(background=True)``
+    and polls ``client.responses.retrieve(...)`` until the response reaches a
+    terminal state. Background mode lets long single-turn calls (reasoning models,
+    deep-research workloads) survive HTTP / proxy / serverless timeouts.
+
+    Only supported by ``OpenAIResponsesModel`` (HTTP transport). Setting this on
+    ``OpenAIResponsesWSModel`` or ``OpenAIChatCompletionsModel`` raises ``UserError``.
+    Background mode is not ZDR-compatible and response data is retained server-side
+    for ~10 minutes.
+    `Learn more <https://platform.openai.com/docs/guides/background>`_.
+    """
+
+    background_poll_interval_seconds: float | None = None
+    """Polling interval (seconds) when ``background=True``.
+
+    When unset, the SDK honors the ``openai-poll-after-ms`` response header from
+    the most recent ``retrieve()``; falls back to 1.0 second when the header is
+    absent. Ignored when ``background`` is not enabled.
+    """
+
     def resolve(self, override: ModelSettings | None) -> ModelSettings:
         """Produce a new ModelSettings by overlaying any non-None values from the
         override on top of this instance."""

diff --git a/src/agents/models/openai_chatcompletions.py b/src/agents/models/openai_chatcompletions.py
@@ -71,6 +71,15 @@ def _non_null_or_omit(self, value: Any) -> Any:
     def _supports_default_prompt_cache_key(self) -> bool:
         return ChatCmplHelpers.is_openai(self._get_client())
 
+    @staticmethod
+    def _handle_unsupported_background(model_settings: ModelSettings) -> None:
+        if model_settings.background:
+            raise UserError(
+                "ModelSettings.background=True is not supported by "
+                "OpenAIChatCompletionsModel; the Chat Completions API has no "
+                "background-mode equivalent. Use OpenAIResponsesModel instead."
+            )
+
     def _handle_unsupported_prompt(self, prompt: ResponsePromptParam | None) -> None:
         if prompt is None:
             return
@@ -140,6 +149,7 @@ async def get_response(
         conversation_id: str | None = None,
         prompt: ResponsePromptParam | None = None,
     ) -> ModelResponse:
+        self._handle_unsupported_background(model_settings)
         self._handle_unsupported_server_managed_conversation_state(
             previous_response_id=previous_response_id,
             conversation_id=conversation_id,
@@ -274,6 +284,7 @@ async def stream_response(
         """
         Yields a partial message as it is generated, as well as the usage information.
         """
+        self._handle_unsupported_background(model_settings)
         self._handle_unsupported_server_managed_conversation_state(
             previous_response_id=previous_response_id,
             conversation_id=conversation_id,

diff --git a/src/agents/models/openai_responses.py b/src/agents/models/openai_responses.py
@@ -92,6 +92,32 @@
     value for value in get_args(ResponseIncludable) if isinstance(value, str)
 )
 
+# Terminal `Response.status` values per the OpenAI Responses API. Mirrors the
+# `ResponseStatus` literal type in `openai-python`. A response whose status is
+# absent from this set (`queued` / `in_progress`) is still being generated and
+# must be polled.
+_RESPONSE_TERMINAL_STATUSES: frozenset[str] = frozenset(
+    {"completed", "failed", "cancelled", "incomplete"}
+)
+
+# Default polling interval when `background=True` and no explicit interval or
+# server header is available. Matches the fallback used by openai-python's
+# `create_and_poll` helpers.
+_DEFAULT_BACKGROUND_POLL_INTERVAL_SECONDS = 1.0
+
+# Server-sent hint header advising the next poll delay (in milliseconds). When
+# the caller has not pinned an explicit `background_poll_interval_seconds`, we
+# honor this header so the loop adapts to server backpressure.
+_BACKGROUND_POLL_AFTER_HEADER = "openai-poll-after-ms"
+
+
+def _is_response_terminal_status(status: str | None) -> bool:
+    """True when `status` is a terminal value (or unset, which we treat as
+    terminal to avoid spinning on unexpected payloads)."""
+    if status is None:
+        return True
+    return status in _RESPONSE_TERMINAL_STATUSES
+
 
 class _NamespaceToolParam(TypedDict):
     type: Literal["namespace"]
@@ -444,6 +470,82 @@ def _consume_background_cleanup_task_result(task: asyncio.Task[Any]) -> None:
         except Exception as exc:
             logger.debug(f"Background stream cleanup failed after cancellation: {exc}")
 
+    def _schedule_background_response_cancel(self, client: AsyncOpenAI, response_id: str) -> None:
+        """Best-effort fire-and-forget cancel of an in-flight background response.
+
+        Invoked when the poll loop is cancelled or hits a non-recoverable error
+        before reaching a terminal state, so that server-side compute is not
+        leaked. Failures from the cancel call itself are swallowed.
+        """
+
+        async def _do_cancel() -> None:
+            try:
+                await client.responses.cancel(response_id)
+            except Exception as exc:
+                logger.debug(
+                    f"Background response cancel for {response_id} failed (ignored): {exc}"
+                )
+
+        try:
+            task = asyncio.create_task(_do_cancel())
+        except RuntimeError:
+            # No running loop available (e.g. interpreter shutdown). Nothing we
+            # can do here; the server response will time out on its own.
+            return
+        task.add_done_callback(self._consume_background_cleanup_task_result)
+
+    async def _poll_background_response_until_terminal(
+        self,
+        *,
+        client: AsyncOpenAI,
+        response: Response,
+        poll_interval_seconds: float | None,
+    ) -> Response:
+        """Poll `responses.retrieve(id)` until the response reaches a terminal status.
+
+        When `poll_interval_seconds` is provided it pins the cadence; otherwise the
+        loop honors the `openai-poll-after-ms` response header and falls back to
+        ``_DEFAULT_BACKGROUND_POLL_INTERVAL_SECONDS`` when no header is present.
+        Mirrors the adaptive-polling pattern used by `openai-python`'s
+        `create_and_poll` helpers.
+
+        On cancellation or unexpected error mid-poll, the in-flight server-side
+        response is cancelled best-effort via
+        `_schedule_background_response_cancel` so compute is not leaked.
+        Reaching a non-`completed` terminal state (`failed` / `cancelled` /
+        `incomplete`) raises `ModelBehaviorError`.
+        """
+        response_id = response.id
+        explicit_interval = poll_interval_seconds
+        interval = (
+            explicit_interval
+            if explicit_interval is not None
+            else _DEFAULT_BACKGROUND_POLL_INTERVAL_SECONDS
+        )
+        try:
+            while not _is_response_terminal_status(response.status):
+                await asyncio.sleep(interval)
+                raw = await client.responses.with_raw_response.retrieve(response_id)
+                response = raw.parse()
+                if explicit_interval is None:
+                    header_value = raw.headers.get(_BACKGROUND_POLL_AFTER_HEADER)
+                    if header_value is not None:
+                        try:
+                            interval = float(header_value) / 1000.0
+                        except (TypeError, ValueError):
+                            # Server sent a malformed header; keep current interval.
+                            pass
+        except BaseException:
+            self._schedule_background_response_cancel(client, response_id)
+            raise
+
+        if response.status != "completed":
+            # Non-`completed` terminal status; the server has already finished
+            # so we don't need to cancel. Raise a model-error so callers see a
+            # consistent failure type.
+            raise response_terminal_failure_error(f"response.{response.status}", response)
+        return response
+
     async def get_response(
         self,
         system_instructions: str | None,
@@ -693,7 +795,14 @@ async def _fetch_response(
 
         if not stream:
             response = await client.responses.create(**create_kwargs)
-            return cast(Response, response)
+            response = cast(Response, response)
+            if model_settings.background and not _is_response_terminal_status(response.status):
+                response = await self._poll_background_response_until_terminal(
+                    client=client,
+                    response=response,
+                    poll_interval_seconds=model_settings.background_poll_interval_seconds,
+                )
+            return response
 
         streaming_response = getattr(client.responses, "with_streaming_response", None)
         stream_create = getattr(streaming_response, "create", None)
@@ -849,6 +958,7 @@ def _build_response_create_kwargs(
             "extra_body": model_settings.extra_body,
             "text": response_format,
             "store": self._non_null_or_omit(model_settings.store),
+            "background": self._non_null_or_omit(model_settings.background),
             "prompt_cache_retention": self._non_null_or_omit(model_settings.prompt_cache_retention),
             "reasoning": self._non_null_or_omit(model_settings.reasoning),
             "metadata": self._non_null_or_omit(model_settings.metadata),
@@ -1082,6 +1192,13 @@ async def _fetch_response(
         stream: Literal[True] | Literal[False] = False,
         prompt: ResponsePromptParam | None = None,
     ) -> Response | AsyncIterator[ResponseStreamEvent]:
+        if model_settings.background:
+            raise UserError(
+                "ModelSettings.background=True is not supported by "
+                "OpenAIResponsesWSModel; the WebSocket transport always streams "
+                "and cannot decouple submit from poll. Use OpenAIResponsesModel "
+                "(HTTP transport) instead."
+            )
         create_kwargs = self._build_response_create_kwargs(
             system_instructions=system_instructions,
             input=input,