Skip to content

Add real-time audio streaming support (Microphone ASR) - c##485

Open
rui-ren wants to merge 3 commits intomainfrom
ruiren/audio-streaming-support-sdk
Open

Add real-time audio streaming support (Microphone ASR) - c##485
rui-ren wants to merge 3 commits intomainfrom
ruiren/audio-streaming-support-sdk

Conversation

@rui-ren
Copy link

@rui-ren rui-ren commented Mar 5, 2026

Summary

Adds real-time audio streaming support to the Foundry Local C# SDK, enabling live microphone-to-text transcription via ONNX Runtime GenAI ASR.

The existing OpenAIAudioClient only supports file-based transcription. This PR introduces a new OpenAIAudioStreamingClient that accepts continuous PCM audio chunks (e.g., from a microphone) and returns partial/final transcription results as an async stream.

What's included

New files

  • src/OpenAI/AudioStreamingClient.cs — Main streaming client with StartAsync(), PushAudioDataAsync(), GetTranscriptionStream(), StopAsync()
  • src/OpenAI/AudioStreamTranscriptionTypes.csAudioStreamTranscriptionResult and CoreErrorResponse types
  • test/FoundryLocal.Tests/AudioStreamingClientTests.cs — 14 unit tests for types, serialization, and settings

Modified files

  • src/IModel.cs — Added GetAudioStreamingClientAsync() to interface
  • src/Model.cs — Delegates to SelectedVariant
  • src/ModelVariant.cs — Implementation (checks model loaded, returns client)
  • src/Detail/ICoreInterop.cs — Added StreamingRequestBuffer struct, AudioStreamSession record, 3 new interface methods
  • src/Detail/CoreInterop.cs — Added 3 P/Invoke bindings (audio_stream_start, audio_stream_push, audio_stream_stop) and managed implementations with GCHandle lifecycle
  • src/Detail/JsonSerializationContext.cs — Registered new types for AOT compatibility

Design highlights

  • Internal push queue — Bounded Channel<T> serializes audio pushes from any thread (safe for mic callbacks) and provides backpressure
  • Retry policy — Transient native errors retried with exponential backoff (3 attempts); permanent errors terminate the session
  • GCHandle lifecycle — Callback delegate explicitly tracked and freed on stop, with cleanup in error paths
  • Settings freeze — Audio format settings are snapshot-copied at StartAsync() and immutable during the session
  • Cancellation-safe stopStopAsync always calls native stop even if cancelled, preventing native session leaks
  • Dedicated session CTS — Push loop uses its own CancellationTokenSource, decoupled from the caller's token

Native core dependency

This PR adds the C# SDK surface. The 3 native exports (audio_stream_start, audio_stream_push, audio_stream_stop) must be implemented in Microsoft.AI.Foundry.Local.Core before integration testing. The code compiles and unit tests for types/settings pass without the native library.

Native contract spec: See audio-streaming-codex.md for the full production contract.

Testing

  • ✅ Build succeeds (dotnet build — 0 errors)
  • ✅ 14 unit tests for serialization, error parsing, and settings (require native lib for assembly setup)
  • ⏳ Integration tests pending native core delivery

Related docs

https://microsoft-my.sharepoint.com/:w:/p/ruiren/IQBhsuMcvAByTqaVZhU7Xn7tAWgVj-EHvJYW2seiItfI2V0?e=HsmT7w

@vercel
Copy link

vercel bot commented Mar 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
foundry-local Ready Ready Preview, Comment Mar 5, 2026 11:52pm

Request Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant