Skip to content

Add real-time audio streaming support (Microphone ASR) - JS#486

Open
rui-ren wants to merge 1 commit intomainfrom
ruiren/audio-streaming-support-sdk-js
Open

Add real-time audio streaming support (Microphone ASR) - JS#486
rui-ren wants to merge 1 commit intomainfrom
ruiren/audio-streaming-support-sdk-js

Conversation

@rui-ren
Copy link

@rui-ren rui-ren commented Mar 5, 2026

Summary

Adds real-time audio streaming support to the Foundry Local JS SDK, enabling live microphone-to-text transcription via ONNX Runtime GenAI ASR.

The existing AudioClient only supports file-based transcription. This PR introduces a new AudioStreamingClient that accepts continuous PCM audio chunks (e.g., from a microphone) and returns partial/final transcription results as an async iterable.

What's included

New files

  • src/openai/audioStreamingClient.ts — Main streaming client with start(), pushAudioData(), getTranscriptionStream(), stop(), dispose()
  • src/openai/audioStreamingTypes.tsAudioStreamTranscriptionResult and CoreErrorResponse interfaces, tryParseCoreError() helper
  • test/openai/audioStreamingClient.test.ts — 16 Mocha/Chai unit tests for types, settings, and client state guards - Donot included, need to wait for dll ready and test it.

Modified files

  • src/imodel.ts — Added createAudioStreamingClient() to interface
  • src/model.ts — Delegates to selectedVariant.createAudioStreamingClient()
  • src/modelVariant.ts — Implementation (creates new AudioStreamingClient(modelId, coreInterop))
  • src/index.ts — Exports AudioStreamingClient, StreamingAudioSettings, AudioStreamTranscriptionResult, CoreErrorResponse

Design highlights

  • Internal async push queue — Bounded AsyncQueue<T> serializes audio pushes from any context (safe for mic callbacks) and provides backpressure. Mirrors C#'s Channel<T> pattern.
  • Retry policy — Transient native errors retried with exponential backoff (3 attempts); permanent errors terminate the session
  • Settings freeze — Audio format settings are snapshot-copied and Object.freeze()d at start(), immutable during the session
  • Buffer copypushAudioData() copies the input Uint8Array before queueing, safe when caller reuses buffers
  • Drain-on-stopstop() completes the push queue, waits for the push loop to drain, then calls native stop
  • Dispose safetydispose() wraps stop() in try/catch, never throws

Native core dependency

This PR adds the JS SDK surface. The 3 native commands (audio_stream_start, audio_stream_push, audio_stream_stop) must be implemented in Microsoft.AI.Foundry.Local.Core before integration testing. The code compiles with zero TypeScript errors without the native library.

Native contract spec: See audio-streaming-codex.md in the C# SDK for the full production contract.

Testing

  • ✅ TypeScript compilation — 0 errors across all source files
  • ✅ 16 unit tests for serialization, error parsing, settings, and client state guards (no native dependency)
  • ⏳ Integration tests pending native core delivery

Parity with C# SDK

This implementation mirrors the C# OpenAIAudioStreamingClient (branch ruiren/audio-streaming-support-sdk) with identical logic:

  • Same session lifecycle: start → push → getStream → stop
  • Same push loop with retry and permanent error handling
  • Same settings freeze and buffer copy semantics
  • Same drain-before-stop ordering

Related docs

https://microsoft-my.sharepoint.com/:w:/p/ruiren/IQBhsuMcvAByTqaVZhU7Xn7tAWgVj-EHvJYW2seiItfI2V0?e=HsmT7w

@vercel
Copy link

vercel bot commented Mar 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
foundry-local Ready Ready Preview, Comment Mar 5, 2026 7:25pm

Request Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant