Skip to content

Comments

Realtime Client Proposal#7285

Open
tarekgh wants to merge 38 commits intodotnet:mainfrom
tarekgh:RealtimeClientProposal
Open

Realtime Client Proposal#7285
tarekgh wants to merge 38 commits intodotnet:mainfrom
tarekgh:RealtimeClientProposal

Conversation

@tarekgh
Copy link
Member

@tarekgh tarekgh commented Feb 11, 2026

Realtime Client Proposal

⚠️ Important Notes

  • This is an experimental proposal. All APIs introduced here are subject to change, and breaking changes should be expected as the design evolves.
  • The OpenAI provider currently uses raw WebSocket/JSON rather than the OpenAI .NET SDK's realtime support. The SDK (v2.8.0) does not yet include the latest Realtime API updates (the relevant PR was recently merged). Once a new SDK version is released, the provider will be refactored to use it, eliminating the manual JSON handling.

Overview

This PR introduces a Realtime Client abstraction layer for Microsoft.Extensions.AI, enabling bidirectional, streaming communication with realtime AI services (e.g., OpenAI's Realtime API). The design follows the same middleware/pipeline patterns established by IChatClient and extends them to realtime sessions over WebSocket connections.

Key changes include:

  • New abstractions (IRealtimeClient, IRealtimeSession, DelegatingRealtimeSession) in Microsoft.Extensions.AI.Abstractions
  • Strongly-typed client/server message types for audio streaming, text, transcription, function calls, and error handling
  • Immutable session configuration via RealtimeSessionOptions with init-only properties and IReadOnlyList<T> for collection types
  • Extensible server message typesRealtimeServerMessageType is a readonly struct (following the ChatRole pattern) rather than a fixed enum, allowing providers to define custom message types
  • Middleware pipeline via RealtimeSessionBuilder with built-in support for:
    • Logging (LoggingRealtimeSession)
    • OpenTelemetry (OpenTelemetryRealtimeSession) following GenAI semantic conventions
    • Function invocation (FunctionInvokingRealtimeSession) with automatic tool call resolution
  • OpenAI provider implementation (OpenAIRealtimeClient, OpenAIRealtimeSession) using WebSocket connections
  • Refactored function invocation — extracted shared logic from FunctionInvokingChatClient into reusable components (FunctionInvocationProcessor, FunctionInvocationHelpers, FunctionInvocationLogger) so both chat and realtime sessions share the same invocation pipeline
  • Unified TranscriptionOptions — shared between Realtime and ISpeechToText APIs

Core API Surface

IRealtimeSession

public interface IRealtimeSession : IDisposable, IAsyncDisposable
{
    Task UpdateAsync(RealtimeSessionOptions options, CancellationToken cancellationToken = default);
    RealtimeSessionOptions? Options { get; }
    Task SendClientMessageAsync(RealtimeClientMessage message, CancellationToken cancellationToken = default);
    IAsyncEnumerable<RealtimeServerMessage> GetStreamingResponseAsync(CancellationToken cancellationToken = default);
    object? GetService(Type serviceType, object? serviceKey = null);
}

Client messages are sent via SendClientMessageAsync at any time during the session. Server messages are consumed by enumerating GetStreamingResponseAsync. This separation allows middleware to intercept both directions independently.


Supported Realtime Messages

Client Messages (sent to the server)

Message Type Description
RealtimeClientConversationItemCreateMessage Creates a conversation item (text, audio, or image) to add to the session context.
RealtimeClientInputAudioBufferAppendMessage Appends a chunk of audio data (PCM) to the server's input audio buffer.
RealtimeClientInputAudioBufferCommitMessage Commits the accumulated audio buffer, signaling the server that the audio input is complete.
RealtimeClientResponseCreateMessage Triggers model inference to generate a response. Properties optionally override session-level configuration for this response only.

Server Messages (received from the server)

Message Type Description
RealtimeServerOutputTextAudioMessage Carries incremental or completed text (via Text) and audio (via Audio) output from the model.
RealtimeServerInputAudioTranscriptionMessage Carries transcription results (incremental or completed) for user audio input.
RealtimeServerResponseCreatedMessage Indicates a response has been created or completed; includes token usage on ResponseDone.
RealtimeServerResponseOutputItemMessage Represents a new output item (e.g., function call) added during response generation.
RealtimeServerErrorMessage Carries error details including ErrorMessageId to correlate with the originating client message.

Server Message Types (RealtimeServerMessageType — extensible readonly struct)

Type Description
RawContentOnly Unrecognized/provider-specific event with raw data in RawRepresentation.
OutputTextDelta / OutputTextDone Incremental / final text output.
OutputAudioDelta / OutputAudioDone Incremental / final audio output.
OutputAudioTranscriptionDelta / OutputAudioTranscriptionDone Model-generated transcription of audio output.
InputAudioTranscriptionDelta / InputAudioTranscriptionCompleted / InputAudioTranscriptionFailed Transcription of user audio input.
ResponseCreated / ResponseDone Response lifecycle events.
ResponseOutputItemAdded / ResponseOutputItemDone Output item lifecycle events.
Error Server error event.
McpCallInProgress / McpCallCompleted / McpCallFailed MCP tool call lifecycle.
McpListToolsInProgress / McpListToolsCompleted / McpListToolsFailed MCP tool listing lifecycle.

Design Decisions

  • RealtimeSessionOptions uses init-only properties — The options object exposed via IRealtimeSession.Options is immutable after creation. To update session configuration, create a new RealtimeSessionOptions instance and call UpdateAsync. Collection properties (Tools, OutputModalities) use IReadOnlyList<T>.
  • RealtimeServerMessageType is a readonly struct (not an enum) — Follows the ChatRole extensibility pattern. Providers can define custom message types; unrecognized events use RawContentOnly with data in RawRepresentation.
  • Single SendClientMessageAsync method — There is one way to send client messages, making middleware interception straightforward. GetStreamingResponseAsync takes no input parameter.
  • RawRepresentationFactory on RealtimeSessionOptions — Allows consumers to provide provider-specific configuration without abstraction leakage.
  • Separate client/server message type hierarchies — Provides type safety at the API boundary and mirrors the ChatMessage/ChatResponse pattern.

Usage Examples

1. Creating a Realtime Client and Session

using Microsoft.Extensions.AI;

IRealtimeClient realtimeClient = new OpenAIRealtimeClient(apiKey: "your-api-key", model: "gpt-4o-realtime-preview");
IRealtimeSession session = await realtimeClient.CreateSessionAsync();

2. Enabling Middlewares

var builder = new RealtimeSessionBuilder(session)
    .UseFunctionInvocation(configure: f =>
    {
        f.AdditionalTools = [getWeatherFunction];
        f.MaximumIterationsPerRequest = 10;
    })
    .UseOpenTelemetry(configure: otel => otel.EnableSensitiveData = true)
    .UseLogging();

IRealtimeSession wrappedSession = builder.Build(services);

3. Configuring the Session

await wrappedSession.UpdateAsync(new RealtimeSessionOptions
{
    OutputModalities = ["audio"],
    Instructions = "You are a helpful assistant.",
    Voice = "alloy",
    VoiceSpeed = 1.0,
    TranscriptionOptions = new TranscriptionOptions { ModelId = "whisper-1", SpeechLanguage = "en" },
    VoiceActivityDetection = new VoiceActivityDetection { CreateResponse = true },
    Tools = [getWeatherFunction]
});

4. Sending and Receiving Messages

var cts = new CancellationTokenSource();

// Start listening for server messages
_ = Task.Run(async () =>
{
    await foreach (var msg in wrappedSession.GetStreamingResponseAsync(cts.Token))
    {
        switch (msg)
        {
            case RealtimeServerOutputTextAudioMessage audio
                when audio.Type == RealtimeServerMessageType.OutputAudioDelta:
                PlayAudio(audio.Audio);
                break;
            case RealtimeServerOutputTextAudioMessage text
                when text.Type == RealtimeServerMessageType.OutputTextDelta:
                Console.Write(text.Text);
                break;
            case RealtimeServerErrorMessage error:
                Console.WriteLine($"Error: {error.Error?.Message}");
                break;
        }
    }
});

// Send a text message
var item = new RealtimeContentItem(
    [new TextContent("What's the weather in Seattle?")],
    role: ChatRole.User);
await wrappedSession.SendClientMessageAsync(new RealtimeClientConversationItemCreateMessage(item: item), cts.Token);
await wrappedSession.SendClientMessageAsync(new RealtimeClientResponseCreateMessage(), cts.Token);

// Send audio
await wrappedSession.SendClientMessageAsync(new RealtimeClientInputAudioBufferAppendMessage(
    audioContent: new DataContent($"data:audio/pcm;base64,{Convert.ToBase64String(pcmBytes)}")), cts.Token);
await wrappedSession.SendClientMessageAsync(new RealtimeClientInputAudioBufferCommitMessage(), cts.Token);
await wrappedSession.SendClientMessageAsync(new RealtimeClientResponseCreateMessage(), cts.Token);

5. Ending the Session

cts.Cancel();
wrappedSession.Dispose();

Demo Application

A complete application consuming the new realtime interfaces can be found at: RealtimeProposalDemoApp

@github-actions github-actions bot added the area-ai Microsoft.Extensions.AI libraries label Feb 11, 2026
@tarekgh tarekgh marked this pull request as ready for review February 11, 2026 03:12
@tarekgh tarekgh requested review from a team as code owners February 11, 2026 03:12
Copilot AI review requested due to automatic review settings February 11, 2026 03:12
@tarekgh tarekgh added this to the 11.0 milestone Feb 11, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an experimental Realtime Client / Session abstraction for Microsoft.Extensions.AI, including middleware-style session pipelines (logging, OpenTelemetry, function invocation) and an initial OpenAI realtime provider, while refactoring function-invocation logic to be shared across chat and realtime flows.

Changes:

  • Add IRealtimeClient / IRealtimeSession abstractions plus realtime message/option types (audio, transcription, response items, errors, etc.).
  • Add RealtimeSessionBuilder pipeline + middleware implementations (LoggingRealtimeSession, OpenTelemetryRealtimeSession, FunctionInvokingRealtimeSession).
  • Refactor shared function invocation into reusable internal components (FunctionInvocationProcessor, helpers, logger), used by both chat and realtime.

Reviewed changes

Copilot reviewed 62 out of 63 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
test/Libraries/Microsoft.Extensions.AI.Tests/Realtime/RealtimeSessionExtensionsTests.cs Unit tests for IRealtimeSession.GetService<T>() extension behavior.
test/Libraries/Microsoft.Extensions.AI.Tests/Realtime/RealtimeSessionBuilderTests.cs Unit tests for RealtimeSessionBuilder pipeline behavior and ordering.
test/Libraries/Microsoft.Extensions.AI.Tests/Realtime/LoggingRealtimeSessionTests.cs Unit tests validating logging middleware behavior across methods and log levels.
test/Libraries/Microsoft.Extensions.AI.Tests/Realtime/FunctionInvokingRealtimeSessionTests.cs Unit tests for function invocation behavior in realtime streaming.
test/Libraries/Microsoft.Extensions.AI.Tests/Realtime/DelegatingRealtimeSessionTests.cs Unit tests for base delegating session behavior (delegation, disposal, services).
test/Libraries/Microsoft.Extensions.AI.Tests/Microsoft.Extensions.AI.Tests.csproj Includes shared TestRealtimeSession in test compilation.
test/Libraries/Microsoft.Extensions.AI.OpenAI.Tests/OpenAIRealtimeSessionTests.cs Unit tests for OpenAI realtime session basic behaviors and guardrails.
test/Libraries/Microsoft.Extensions.AI.OpenAI.Tests/OpenAIRealtimeClientTests.cs Unit tests for OpenAI realtime client creation and service exposure.
test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/TestRealtimeSession.cs Test double for IRealtimeSession with callback hooks.
test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Realtime/RealtimeSessionOptionsTests.cs Tests for RealtimeSessionOptions and related option types.
test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Realtime/RealtimeServerMessageTests.cs Tests for server message types and their property roundtrips.
test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Realtime/RealtimeContentItemTests.cs Tests for RealtimeContentItem construction and mutation.
test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Realtime/RealtimeClientMessageTests.cs Tests for client message types and their properties.
test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Realtime/RealtimeAudioFormatTests.cs Tests for RealtimeAudioFormat behavior.
src/Libraries/Microsoft.Extensions.AI/Realtime/RealtimeSessionExtensions.cs Adds GetService<T>() extension for IRealtimeSession.
src/Libraries/Microsoft.Extensions.AI/Realtime/RealtimeSessionBuilderRealtimeSessionExtensions.cs Adds AsBuilder() extension for sessions.
src/Libraries/Microsoft.Extensions.AI/Realtime/RealtimeSessionBuilder.cs Implements session middleware/pipeline builder.
src/Libraries/Microsoft.Extensions.AI/Realtime/OpenTelemetryRealtimeSessionBuilderExtensions.cs Builder extension to add OpenTelemetry middleware to a realtime session.
src/Libraries/Microsoft.Extensions.AI/Realtime/LoggingRealtimeSessionBuilderExtensions.cs Builder extension to add logging middleware to a realtime session.
src/Libraries/Microsoft.Extensions.AI/Realtime/LoggingRealtimeSession.cs Delegating session middleware that logs calls and streaming messages.
src/Libraries/Microsoft.Extensions.AI/Realtime/FunctionInvokingRealtimeSessionBuilderExtensions.cs Builder extension to add function invocation middleware.
src/Libraries/Microsoft.Extensions.AI/Realtime/FunctionInvokingRealtimeSession.cs Implements tool/function invocation loop for realtime streaming.
src/Libraries/Microsoft.Extensions.AI/Realtime/AnonymousDelegatingRealtimeSession.cs Anonymous delegate-based middleware for streaming interception.
src/Libraries/Microsoft.Extensions.AI/OpenTelemetryConsts.cs Extends OpenTelemetry constants for realtime and token subcategories.
src/Libraries/Microsoft.Extensions.AI/Common/FunctionInvocationStatus.cs Shared internal status enum for invocation outcomes.
src/Libraries/Microsoft.Extensions.AI/Common/FunctionInvocationProcessor.cs Shared processor implementing serial/parallel invocation with instrumentation.
src/Libraries/Microsoft.Extensions.AI/Common/FunctionInvocationLogger.cs Shared logger messages used by chat and realtime invocation flows.
src/Libraries/Microsoft.Extensions.AI/Common/FunctionInvocationHelpers.cs Shared helpers (activity detection, elapsed time, tool map creation).
src/Libraries/Microsoft.Extensions.AI/ChatCompletion/FunctionInvokingChatClient.cs Refactors chat function invocation to use shared processor/helpers/logger.
src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIRealtimeClient.cs Adds OpenAI realtime client implementation that creates/initializes sessions.
src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIClientExtensions.cs Adds AsIRealtimeClient extension for OpenAI client integration.
src/Libraries/Microsoft.Extensions.AI.OpenAI/Microsoft.Extensions.AI.OpenAI.csproj Adds internals visibility for tests and Channels dependency (non-net10).
src/Libraries/Microsoft.Extensions.AI.Evaluation.Reporting/CSharp/Microsoft.Extensions.AI.Evaluation.Reporting.csproj Comment formatting change.
src/Libraries/Microsoft.Extensions.AI.Abstractions/UsageDetails.cs Adds realtime-specific token breakdown fields.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Tools/ToolChoiceMode.cs Adds tool choice mode enum for realtime use.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/VoiceActivityDetection.cs Adds VAD options type.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/TranscriptionOptions.cs Adds transcription configuration type.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/ServerVoiceActivityDetection.cs Adds server VAD settings.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/SemanticVoiceActivityDetection.cs Adds semantic VAD settings.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeSessionOptions.cs Adds session configuration options (audio formats, tools, tracing, etc.).
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeSessionKind.cs Adds session kind enum (realtime vs transcription).
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeServerResponseOutputItemMessage.cs Adds server message for output items.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeServerResponseCreatedMessage.cs Adds server message for response lifecycle/usage metadata.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeServerOutputTextAudioMessage.cs Adds server message for output text/audio streaming.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeServerMessageType.cs Adds server message type enum.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeServerMessage.cs Adds base server message type.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeServerInputAudioTranscriptionMessage.cs Adds server transcription message type.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeServerErrorMessage.cs Adds server error message type.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeContentItem.cs Adds realtime conversation item wrapper.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeClientResponseCreateMessage.cs Adds client response request message type (modalities/tools/etc.).
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeClientMessage.cs Adds base client message type.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeClientInputAudioBufferCommitMessage.cs Adds client message for committing audio input buffer.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeClientInputAudioBufferAppendMessage.cs Adds client message for appending audio input buffer.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeClientConversationItemCreateMessage.cs Adds client message for creating a conversation item.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeAudioFormat.cs Adds audio format specification type.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/NoiseReductionOptions.cs Adds noise reduction options enum.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/IRealtimeSession.cs Adds realtime session interface.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/IRealtimeClient.cs Adds realtime client interface.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/DelegatingRealtimeSession.cs Adds base delegating session implementation.

Copy link
Contributor

@shyamnamboodiripad shyamnamboodiripad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signing off on behalf of eval (so that the whitespace change in Reporting.csproj does not block merge)

tarekgh and others added 4 commits February 11, 2026 14:13
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
The extension method on OpenAIClient was not useful because it
completely ignored the OpenAIClient instance - only validating it
for null before creating a new OpenAIRealtimeClient with the
separately provided apiKey and model parameters.

Users can construct OpenAIRealtimeClient directly instead.
- Fix RealtimeSessionExtensions XML doc to reference IRealtimeSession
  instead of IChatClient
- Replace non-standard <ref name> tags with <see cref> in
  RealtimeServerMessageType.cs for proper IntelliSense/doc rendering
- Fix ResponseDone doc summary to say 'completed' instead of 'created'
- Add missing Throw.IfNull(updates) in LoggingRealtimeSession
  .GetStreamingResponseAsync for consistency with other sessions
- Split RealtimeServerMessageType enum: add ResponseOutputItemDone
  and ResponseOutputItemAdded to distinguish per-item events
  (response.output_item.done, conversation.item.done) from
  whole-response events (response.done, response.created)

- Fix function result serialization: use JsonSerializer.Serialize()
  instead of ToString() to properly serialize complex objects

- Fix OTel streaming duration: start stopwatch at method entry
  instead of immediately before recording, so duration histogram
  measures actual streaming time

- URL-encode model name in WebSocket URI for defensive safety

- Fix OTel metadata tag ordering: apply user metadata before
  standard tags so standard OTel attributes take precedence
  if keys collide
/// <summary>
/// Gets the current session options.
/// </summary>
RealtimeSessionOptions? Options { get; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear as to the semantic of this. RealtimeSessionOptions is a mutable class. If I start setting properties on that while the session is active, is that going to result in immediate changes in behavior?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the abstraction includes an Update Session operation that must be called to update the session. I need a type to use when updating the session (it must be a writable object), and I also want to expose the same information to anyone requesting it at any time, in that case, it can be read-only.
The reason is that, in the middleware layer, I need access to the session properties. OpenAI models allow updating the session after it has been created, but I believe not all providers allow that. Therefore, in most scenarios, I expect that once the session is created with the desired configuration, it will not change much afterward.

I am trying to avoid having two types for that. Do you have a better idea handling that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can make the session options using init instead of setters. This will make the object immutable. I believe this will solve the confusion and will be a clearer design. I'll try that and see how it goes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve made the RealtimeSessionOptions properties non-settable so the object is immutable after creation. This makes the design clearer — the only way to modify session settings now is through UpdateAsync.

/// <remarks>
/// This method allows for the injection of client messages into the session at any time, which can be used to influence the session's behavior or state.
/// </remarks>
Task InjectClientMessageAsync(RealtimeClientMessage message, CancellationToken cancellationToken = default);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the word "Inject"... is that standard terminology used by the providers? Is this just "Send"? How does this relate to GetStreamingResponseAsync... is this only valid when someone is actively enumerating?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve renamed it to Send instead of Inject.

I also understand your concern about having two separate ways to send client messages. Originally, I introduced that approach so middleware (like function invocation) could send client messages, since it wasn’t straightforward to append a new item to the IAsyncEnumerable<RealtimeClientMessage> passed into GetStreamingResponseAsync. With that in mind, I simplified the interface to provide a single, consistent way to send client messages, while still working cleanly with the middleware.

Here are the changes:

  • Renamed InjectClientMessageAsyncSendClientMessageAsync
  • Updated GetStreamingResponseAsync(IAsyncEnumerable<RealtimeClientMessage> updates, CancellationToken) to GetStreamingResponseAsync(CancellationToken) and removed the updates parameter

Please let me know if this feels cleaner and better structured.

/// <param name="cancellationToken">A token to cancel the operation.</param>
/// <returns>The response messages generated by the session.</returns>
/// <remarks>
/// This method cannot be called multiple times concurrently on the same session instance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the session itself be enumerable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider that, but I believe keeping GetStreamingResponseAsync is preferable to making the session itself IAsyncEnumerable, for a few reasons:

  • Middleware delegation becomes awkward with IAsyncEnumerable.
    DelegatingRealtimeSession would need to implement GetAsyncEnumerator() and delegate to the inner session’s enumerator. Overriding a virtual method like GetStreamingResponseAsync follows the established pattern in this repo (e.g., IChatClient.GetStreamingResponseAsync) and keeps things consistent.

  • Re-enumeration semantics are unclear.
    With IAsyncEnumerable, what happens if GetAsyncEnumerator() is called twice? With an explicit method, we can clearly document that it “cannot be called concurrently,” which feels more natural and predictable. With IAsyncEnumerable, the behavior is less obvious.

  • CancellationToken handling is less discoverable.
    While IAsyncEnumerable<T>.GetAsyncEnumerator(CancellationToken) does support cancellation, it’s not as explicit as a CancellationToken parameter on a method. The [EnumeratorCancellation] pattern is also less intuitive and easier to overlook.

Overall, keeping GetStreamingResponseAsync makes the design clearer, more consistent with the existing abstractions, and easier to reason about.

/// <summary>
/// For far-field microphones.
/// </summary>
FarField
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do any providers have the notion of "Auto"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some research and not seeing auto is used at all with noise reduction. Gemeini for example has no configuration exposed to spcify it. Mistral’s Voxtral handles this via transcription_delay_ms and model selection, Near-field Strategy: Use a lower delay (e.g., 240ms) and Far-field Strategy: Use a higher delay (e.g., 480ms to 1200ms).

await _sendLock.WaitAsync(cancellationToken).ConfigureAwait(false);
lockTaken = true;

await _webSocket.SendAsync(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! This isn't using the OpenAI library's realtime support? Why not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow, I had the impression of not taking a dependency on third-party libraries. Looks like I was wrong 🥹. I'll look at that and update. Thanks!

Copy link
Member Author

@tarekgh tarekgh Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at OpenAI SDK, looks the latest package 2.8.0 doesn't have the updates for the Realtime model. I am seeing they have merged the PR openai/openai-dotnet#928 two days ago. I think we need to wait them publish a new version then we can consume it. I'll try to watch that.

@tarekgh tarekgh force-pushed the RealtimeClientProposal branch from 81300f4 to 8ad70f5 Compare February 12, 2026 19:59
@tarekgh tarekgh force-pushed the RealtimeClientProposal branch from 8ad70f5 to fbdc7cb Compare February 12, 2026 20:16
Tarek Mahmoud Sayed added 6 commits February 12, 2026 12:49
- Move TranscriptionOptions from Realtime/ to SpeechToText/ folder
- Change experimental flag from AIRealTime to AISpeechToText
- Make properties nullable with parameterless constructor
- Rename Language to SpeechLanguage, Model to ModelId
- Replace SpeechToTextOptions.ModelId and .SpeechLanguage with Transcription property
- Update all consumers and tests
Tarek Mahmoud Sayed and others added 22 commits February 13, 2026 16:26
- Make OpenAIRealtimeSession constructor and ConnectAsync public
- Remove InternalsVisibleToTest from csproj
- Remove OpenAIRealtimeSessionSerializationTests (depended on internal ConnectWithWebSocketAsync)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add Func<IRealtimeSession, object?>? RawRepresentationFactory property following
the same pattern used by ChatOptions, EmbeddingGenerationOptions, and other
abstraction options types. Add note in OpenAIRealtimeSession to consume the
factory when switching to the OpenAI SDK.
Remove EnableAutoTracing, TracingGroupId, TracingWorkflowName, and
TracingMetadata from the abstraction layer. These are OpenAI-specific
and should be configured via RawRepresentationFactory when the OpenAI
SDK dependency is added.
Remove redundant AIFunction and HostedMcpServerTool properties from
RealtimeSessionOptions and RealtimeClientResponseCreateMessage. Callers
should use ChatToolMode.RequireSpecific(functionName) instead.

Update OpenAI serialization to emit structured tool_choice JSON object
when RequireSpecific is used. Update OpenTelemetry and tests accordingly.
Add separate Audio property for Base64-encoded audio data. Text is now
only used for text and transcript content. Update OpenAI parser,
OpenTelemetry session, and tests accordingly.
Follow the ChatRole smart-enum pattern: readonly struct with string
Value, IEquatable, operators, and JsonConverter. Providers can now
define custom message types by constructing new instances.

Update pattern-matching in OpenTelemetryRealtimeSession to use ==
comparisons instead of constant patterns.
Remove the Parameter property from RealtimeServerErrorMessage and
map error.param to ErrorContent.Details instead. Improve ErrorEventId
XML docs to clarify it correlates to the originating client event.
Add object? RawRepresentation to hold the original provider data
structure, following the same pattern as other types in the
abstraction layer (e.g., ChatMessage). Updated tests accordingly.
Rename the Metadata property to AdditionalProperties on both
RealtimeClientResponseCreateMessage and RealtimeServerResponseCreatedMessage
to be consistent with the established pattern used across the AI
abstractions (ChatMessage, ChatOptions, AIContent, etc.). Updated
XML docs, OpenAI provider, OTel session, and tests accordingly.
Clarify that MaxOutputTokens is a total budget across all output
modalities (text, audio) and tool calls, not per-modality.
Clarify that ExcludeFromConversation creates an out-of-band response
whose output is not added to conversation history. Document that
Instructions, Tools, ToolMode, OutputModalities, OutputAudioOptions,
and OutputVoice are per-response overrides of session configuration.
Clarify that this message triggers model inference and that its
properties are per-response overrides of session configuration.
Rename EventId to MessageId on RealtimeClientMessage and
RealtimeServerMessage, and ErrorEventId to ErrorMessageId on
RealtimeServerErrorMessage. The abstraction uses 'message' terminology
throughout (class names, docs, method signatures), so properties
should match. The OpenAI provider maps MessageId to/from the wire
protocol's event_id field.
Rename to match the established MediaType naming convention used
across the abstractions (DataContent, HostedFileContent, UriContent,
ImageGenerationOptions). Updated OpenAI provider and tests.
…entMessageAsync and remove updates parameter from GetStreamingResponseAsync

- Rename InjectClientMessageAsync -> SendClientMessageAsync across all implementations
- Remove IAsyncEnumerable<RealtimeClientMessage> updates parameter from GetStreamingResponseAsync
- Move per-message telemetry from WrapClientMessagesForTelemetryAsync into SendClientMessageAsync override in OpenTelemetryRealtimeSession
- Delete WrapUpdatesWithLoggingAsync from LoggingRealtimeSession
- Delete WrapClientMessagesForTelemetryAsync from OpenTelemetryRealtimeSession
- Update AnonymousDelegatingRealtimeSession delegate signature
- Update RealtimeSessionBuilder.Use overload signature
- Update all tests to use new API

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-ai Microsoft.Extensions.AI libraries

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants