feat: Responses API background mode (background=True + adaptive polling)#3472
Closed
dtlics wants to merge 6 commits into
Closed
feat: Responses API background mode (background=True + adaptive polling)#3472dtlics wants to merge 6 commits into
dtlics wants to merge 6 commits into
Conversation
…onds fields Append two optional fields to ModelSettings to opt into Responses API background mode. background=True submits via responses.create(background=True) and adaptively polls responses.retrieve(id) until terminal; the optional poll_interval_seconds pins the cadence or defers to the openai-poll-after-ms response header. Fields are appended at the end of the dataclass per AGENTS.md's positional compatibility rule. background is added to _TRACEABLE_MODEL_SETTING_FIELDS so the flag is captured in spans; the interval is operational metadata and is intentionally excluded.
…und mode When ModelSettings.background is True, OpenAIResponsesModel.get_response now submits via responses.create(background=True), then polls responses.retrieve(id) until the response reaches a terminal status (completed | failed | cancelled | incomplete). Streaming pass-through is unchanged: stream_response forwards background=True to responses.create(stream=True, background=True) for server-side durability without client-side auto-resume. Polling honors the openai-poll-after-ms response header for adaptive intervals (matches openai-python's create_and_poll pattern); an explicit background_poll_interval_seconds overrides the header; the fallback is 1.0s. On asyncio.CancelledError or a non-recoverable error mid-poll, the SDK schedules a fire-and-forget responses.cancel(id) so server-side compute is not leaked, then re-raises. Non-completed terminal states raise the existing response_terminal_failure_error helper. background is plumbed through _build_response_create_kwargs alongside store and prompt_cache_retention, so the existing extra_args duplicate-key check catches accidental double-spec.
…ompletions adapters Setting ModelSettings.background=True on an adapter that cannot honor it must fail loudly rather than silently drop the durability guarantee the caller opted into: - OpenAIResponsesWSModel: the WebSocket transport always streams and cannot decouple submit from poll. Raise UserError in the overridden _fetch_response so both get_response and stream_response paths are covered. - OpenAIChatCompletionsModel: the Chat Completions API has no background parameter. Add _handle_unsupported_background and call it at the top of get_response and stream_response, mirroring the existing _handle_unsupported_prompt pattern.
… and rejections
Add 15 tests for the new background mode:
- terminal-on-first-response (no poll triggered)
- multi-poll until completed
- terminal failures (failed | cancelled | incomplete) raise ModelBehaviorError
- openai-poll-after-ms header drives the next sleep interval
- explicit background_poll_interval_seconds overrides the header
- asyncio.CancelledError mid-poll schedules a fire-and-forget responses.cancel(id)
and re-raises (uses a real-sleep handle captured pre-monkeypatch to avoid
re-tripping the cancel after the test undoes the patch)
- background=True is plumbed into the responses.create() kwargs
- extra_args={"background": True} + ModelSettings.background=True surfaces
the existing duplicate-key TypeError
- streaming + background passes through unchanged
- OpenAIResponsesWSModel rejects background=True from both get_response
and stream_response
- OpenAIChatCompletionsModel rejects background=True from both get_response
and stream_response
Update test_all_fields_serialization to set the two new ModelSettings fields
so the "every field non-None" invariant still holds.
New docs/background.md describes the transparent use through Runner, the streaming pass-through, retrieving a response by id via the underlying AsyncOpenAI client, the cancel-on-CancelledError behavior, supported backends (Responses HTTP only — WS and Chat Completions raise UserError), and the platform limits (~10-minute retention, not ZDR-compatible). Registered under "Background mode" in all four language nav sections in mkdocs.yml. Translated content for ja/ko/zh will be generated by the existing docs translation pipeline.
examples/background_mode/main.py runs the same prompt twice — once synchronously, once with ModelSettings(background=True) — to demonstrate that opting into background mode is a one-field change at the Agent level and produces equivalent final output, with the durability win coming from the underlying submit + poll transport rather than from the SDK API.
Member
|
Closing this for this reason: #3471 (comment) Thanks for your interest here. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Refs #3471.
What this branch does
background: bool | Noneandbackground_poll_interval_seconds: float | NonetoModelSettings(preserves positional ordering perAGENTS.md).OpenAIResponsesModel.get_responsevia a new private_poll_background_response_until_terminal. Honorsopenai-poll-after-msresponse headers; explicitbackground_poll_interval_secondsoverrides; falls back to 1.0s. Onasyncio.CancelledErroror terminal failure, schedulesclient.responses.cancel(id)fire-and-forget so server-side work doesn't leak.stream_responseis unchanged at the call-site level —background=Trueflows through_build_response_create_kwargsinto the streamingresponses.create()call, giving server-side durability without client-side resume logic.OpenAIChatCompletionsModelandOpenAIResponsesWSModelraiseUserErrorwhenbackground=Trueis set, so users don't silently lose the durability guarantee they opted into.docs/background.mdregistered inmkdocs.yml; runnable example atexamples/background_mode/main.py.Diff: 791 insertions, 1 deletion across 10 files. No changes to
run.py, anything underrun_internal/, orrun_state.py—CURRENT_SCHEMA_VERSIONis not bumped.Design choices currently reflected in the code
These map to the four open questions in #3471. All are easy to flip — happy to restructure based on maintainer preference.
ModelSettings, notRunConfig. Joins the existing family of model-call toggles (store,reasoning,prompt_cache_retention,context_management) and gives per-agent granularity in multi-agent runs.UserError, not silent no-op. Loud failure, on the reasoning that users opted into a server-side durability guarantee that those transports can't provide. Counterpoint (raised in Support Responses API background mode in Runner (background=True + adaptive polling) #3471):reasoning/verbosityalready silently no-op on those backends, so picking the opposite policy here is an inconsistency. Easy to switch to silent no-op if preferred.starting_afterauto-resume. Mirrors plainopenai-python's behavior (it exposesresponses.stream(response_id=..., starting_after=N)as a primitive but doesn't auto-resume on disconnect).AsyncOpenAI.max_retriesonly (partial). Each retrieve call uses the client's built-inmax_retriesfor transient HTTP failures. Honest gap: the plan's intent was that an exhausted-retry failure during polling should not propagate up toget_response_with_retryand trigger a fresh submit+poll cycle (which would discard a possibly-minutes-long in-flight reasoning response). The current code does not mark such exceptions as non-retriable, so outer-envelope replay can still happen subject to retry policy. The cancel-on-CancelledError half of Q4 is implemented; the suppress-outer-retry half is not. Will fix once Q4 lands one way or the other.Test plan
make format && make lint && make typecheck && make testsall pass on the branch:make format—ruff format: 778 files left unchanged.ruff check --fix: All checks passed.make lint—ruff check: All checks passed.make typecheck—mypy: Success, no issues found in 772 source files.pyright: 0 errors, 0 warnings, 0 informations.make tests— parallel: 4589 passed, 2 skipped in 16.61s. Serial: 27 passed, 5 skipped, 4590 deselected in 4.82s.New tests (in
tests/models/test_openai_responses.pyandtests/models/test_openai_chatcompletions.py) cover: terminal-on-first-response fast path, queued→in_progress→completed multi-poll, terminalfailed/cancelled/incompleteraises,openai-poll-after-msheader honored, explicit interval overrides header,CancelledErrorduring poll schedulesresponses.cancel,extra_args={"background": True}conflict raisesTypeError, streaming pass-through, and Chat Completions / WSUserErrorrejections.Not verified by me:
make build-docs/make build-full-docswere not run. The new docs page is registered in all four language nav sections inmkdocs.yml(en/ja/ko/zh) on the assumption the existing translation pipeline fills in non-English content — worth a reviewer eye.Out of scope (proposed follow-ups, mirroring #3471)
responses.stream(response_id=..., starting_after=N)on transport errors.Runnerresume / continuation tokens — would needRunStateschema bump.OpenAIResponsesModel.retrieve_response()helper — users can callclient.responses.retrieve(id)directly on the underlying client, so this would add public API surface without adding capability.