Support Realtime custom voice objects#3473
Open
lionel-oai wants to merge 1 commit into
Open
Conversation
9a489b8 to
eed10dc
Compare
eed10dc to
20e7135
Compare
seratch
requested changes
May 20, 2026
| return normalized | ||
|
|
||
|
|
||
| def _create_realtime_audio_output(audio_output_args: dict[str, Any]) -> Any: |
Member
There was a problem hiding this comment.
If we upgrade openai package to openai>=2.36.0 , this workaround is not necessary while _normalize_custom_voice_for_server_event_validation is still required even with the latest version.
Can you add quick TODO comments explaining why and when to remove to these internal workarounds?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes Realtime custom voice handling in the Agents SDK.
Realtime sessions can receive and send structured custom voice objects such as
{"id": "voice_..."}, but the SDK previously typed voice settings as strings and validated inbound server events before updating response lifecycle state. If a server event such asresponse.createdorresponse.donecontained a structured voice object that failed validation, the SDK could skip response state updates and leave the response-create sequencer blocked. That could prevent the nextresponse.createfrom being sent after tool output.The change adds typed support for custom voice objects in Realtime session settings, preserves structured voices when building outbound
session.updatepayloads, and adds a validation fallback for inbound server events so custom voice objects do not break response lifecycle tracking.Tests
make formatmake lintuv run pytest -q tests/realtime/test_openai_realtime.py tests/realtime/test_realtime_model_settings.pyuv run pytest -q tests/realtime/test_session.py -k "handoff_session_update_preserves_custom_voice or handoff_tool_handling"uv run mypy src/agents/realtime/config.py src/agents/realtime/openai_realtime.py tests/realtime/test_openai_realtime.pyuv run pyright src/agents/realtime/config.py src/agents/realtime/openai_realtime.py tests/realtime/test_openai_realtime.pyuv run mypy tests/realtime/test_session.pyuv run pyright tests/realtime/test_session.pyFull
make tests/make typecheckwere not completed locally because optional dependency installation was blocked by a socket-firewall tunnel failure while downloadingdocstring-parser==0.18.0.