Skip to content

feat: Responses API background mode (background=True + adaptive polling)#3472

Closed
dtlics wants to merge 6 commits into
openai:mainfrom
dtlics:feat/responses-background-mode
Closed

feat: Responses API background mode (background=True + adaptive polling)#3472
dtlics wants to merge 6 commits into
openai:mainfrom
dtlics:feat/responses-background-mode

Conversation

@dtlics
Copy link
Copy Markdown

@dtlics dtlics commented May 20, 2026

Refs #3471.

Draft. Design questions in #3471 are still open — posting this PR alongside the issue to make the design concrete, not to request review. Will mark ready for review after maintainer feedback on the four open questions.

What this branch does

  • Appends background: bool | None and background_poll_interval_seconds: float | None to ModelSettings (preserves positional ordering per AGENTS.md).
  • Adds a submit-and-adaptive-poll loop to OpenAIResponsesModel.get_response via a new private _poll_background_response_until_terminal. Honors openai-poll-after-ms response headers; explicit background_poll_interval_seconds overrides; falls back to 1.0s. On asyncio.CancelledError or terminal failure, schedules client.responses.cancel(id) fire-and-forget so server-side work doesn't leak.
  • stream_response is unchanged at the call-site level — background=True flows through _build_response_create_kwargs into the streaming responses.create() call, giving server-side durability without client-side resume logic.
  • OpenAIChatCompletionsModel and OpenAIResponsesWSModel raise UserError when background=True is set, so users don't silently lose the durability guarantee they opted into.
  • Docs page at docs/background.md registered in mkdocs.yml; runnable example at examples/background_mode/main.py.

Diff: 791 insertions, 1 deletion across 10 files. No changes to run.py, anything under run_internal/, or run_state.pyCURRENT_SCHEMA_VERSION is not bumped.

Design choices currently reflected in the code

These map to the four open questions in #3471. All are easy to flip — happy to restructure based on maintainer preference.

  1. Surface — ModelSettings, not RunConfig. Joins the existing family of model-call toggles (store, reasoning, prompt_cache_retention, context_management) and gives per-agent granularity in multi-agent runs.
  2. Non-Responses backends — UserError, not silent no-op. Loud failure, on the reasoning that users opted into a server-side durability guarantee that those transports can't provide. Counterpoint (raised in Support Responses API background mode in Runner (background=True + adaptive polling) #3471): reasoning / verbosity already silently no-op on those backends, so picking the opposite policy here is an inconsistency. Easy to switch to silent no-op if preferred.
  3. Streaming — server-side durability only, no client-side starting_after auto-resume. Mirrors plain openai-python's behavior (it exposes responses.stream(response_id=..., starting_after=N) as a primitive but doesn't auto-resume on disconnect).
  4. Retrieve-call retries — AsyncOpenAI.max_retries only (partial). Each retrieve call uses the client's built-in max_retries for transient HTTP failures. Honest gap: the plan's intent was that an exhausted-retry failure during polling should not propagate up to get_response_with_retry and trigger a fresh submit+poll cycle (which would discard a possibly-minutes-long in-flight reasoning response). The current code does not mark such exceptions as non-retriable, so outer-envelope replay can still happen subject to retry policy. The cancel-on-CancelledError half of Q4 is implemented; the suppress-outer-retry half is not. Will fix once Q4 lands one way or the other.

Test plan

make format && make lint && make typecheck && make tests all pass on the branch:

  • make formatruff format: 778 files left unchanged. ruff check --fix: All checks passed.
  • make lintruff check: All checks passed.
  • make typecheckmypy: Success, no issues found in 772 source files. pyright: 0 errors, 0 warnings, 0 informations.
  • make tests — parallel: 4589 passed, 2 skipped in 16.61s. Serial: 27 passed, 5 skipped, 4590 deselected in 4.82s.

New tests (in tests/models/test_openai_responses.py and tests/models/test_openai_chatcompletions.py) cover: terminal-on-first-response fast path, queued→in_progress→completed multi-poll, terminal failed / cancelled / incomplete raises, openai-poll-after-ms header honored, explicit interval overrides header, CancelledError during poll schedules responses.cancel, extra_args={"background": True} conflict raises TypeError, streaming pass-through, and Chat Completions / WS UserError rejections.

Not verified by me: make build-docs / make build-full-docs were not run. The new docs page is registered in all four language nav sections in mkdocs.yml (en/ja/ko/zh) on the assumption the existing translation pipeline fills in non-English content — worth a reviewer eye.

Out of scope (proposed follow-ups, mirroring #3471)

  • Stream auto-resume via responses.stream(response_id=..., starting_after=N) on transport errors.
  • Cross-process Runner resume / continuation tokens — would need RunState schema bump.
  • ZDR enforcement — background mode is not ZDR-compatible and retains response data for ~10 minutes server-side.
  • Auto-detection of when to use background mode based on model / payload.
  • OpenAIResponsesModel.retrieve_response() helper — users can call client.responses.retrieve(id) directly on the underlying client, so this would add public API surface without adding capability.

dtlics added 6 commits May 20, 2026 15:18
…onds fields

Append two optional fields to ModelSettings to opt into Responses API
background mode. background=True submits via responses.create(background=True)
and adaptively polls responses.retrieve(id) until terminal; the optional
poll_interval_seconds pins the cadence or defers to the openai-poll-after-ms
response header.

Fields are appended at the end of the dataclass per AGENTS.md's positional
compatibility rule. background is added to _TRACEABLE_MODEL_SETTING_FIELDS so
the flag is captured in spans; the interval is operational metadata and is
intentionally excluded.
…und mode

When ModelSettings.background is True, OpenAIResponsesModel.get_response now
submits via responses.create(background=True), then polls responses.retrieve(id)
until the response reaches a terminal status (completed | failed | cancelled |
incomplete). Streaming pass-through is unchanged: stream_response forwards
background=True to responses.create(stream=True, background=True) for
server-side durability without client-side auto-resume.

Polling honors the openai-poll-after-ms response header for adaptive intervals
(matches openai-python's create_and_poll pattern); an explicit
background_poll_interval_seconds overrides the header; the fallback is 1.0s.

On asyncio.CancelledError or a non-recoverable error mid-poll, the SDK
schedules a fire-and-forget responses.cancel(id) so server-side compute is
not leaked, then re-raises. Non-completed terminal states raise the existing
response_terminal_failure_error helper.

background is plumbed through _build_response_create_kwargs alongside store
and prompt_cache_retention, so the existing extra_args duplicate-key check
catches accidental double-spec.
…ompletions adapters

Setting ModelSettings.background=True on an adapter that cannot honor it
must fail loudly rather than silently drop the durability guarantee the
caller opted into:

- OpenAIResponsesWSModel: the WebSocket transport always streams and cannot
  decouple submit from poll. Raise UserError in the overridden
  _fetch_response so both get_response and stream_response paths are covered.

- OpenAIChatCompletionsModel: the Chat Completions API has no background
  parameter. Add _handle_unsupported_background and call it at the top of
  get_response and stream_response, mirroring the existing
  _handle_unsupported_prompt pattern.
… and rejections

Add 15 tests for the new background mode:

- terminal-on-first-response (no poll triggered)
- multi-poll until completed
- terminal failures (failed | cancelled | incomplete) raise ModelBehaviorError
- openai-poll-after-ms header drives the next sleep interval
- explicit background_poll_interval_seconds overrides the header
- asyncio.CancelledError mid-poll schedules a fire-and-forget responses.cancel(id)
  and re-raises (uses a real-sleep handle captured pre-monkeypatch to avoid
  re-tripping the cancel after the test undoes the patch)
- background=True is plumbed into the responses.create() kwargs
- extra_args={"background": True} + ModelSettings.background=True surfaces
  the existing duplicate-key TypeError
- streaming + background passes through unchanged
- OpenAIResponsesWSModel rejects background=True from both get_response
  and stream_response
- OpenAIChatCompletionsModel rejects background=True from both get_response
  and stream_response

Update test_all_fields_serialization to set the two new ModelSettings fields
so the "every field non-None" invariant still holds.
New docs/background.md describes the transparent use through Runner, the
streaming pass-through, retrieving a response by id via the underlying
AsyncOpenAI client, the cancel-on-CancelledError behavior, supported
backends (Responses HTTP only — WS and Chat Completions raise UserError),
and the platform limits (~10-minute retention, not ZDR-compatible).

Registered under "Background mode" in all four language nav sections in
mkdocs.yml. Translated content for ja/ko/zh will be generated by the
existing docs translation pipeline.
examples/background_mode/main.py runs the same prompt twice — once
synchronously, once with ModelSettings(background=True) — to demonstrate
that opting into background mode is a one-field change at the Agent level
and produces equivalent final output, with the durability win coming from
the underlying submit + poll transport rather than from the SDK API.
@seratch
Copy link
Copy Markdown
Member

seratch commented May 20, 2026

Closing this for this reason: #3471 (comment) Thanks for your interest here.

@seratch seratch closed this May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants