Add real-time audio streaming support (Microphone ASR) - JS#486
Open
Add real-time audio streaming support (Microphone ASR) - JS#486
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds real-time audio streaming support to the Foundry Local JS SDK, enabling live microphone-to-text transcription via ONNX Runtime GenAI ASR.
The existing
AudioClientonly supports file-based transcription. This PR introduces a newAudioStreamingClientthat accepts continuous PCM audio chunks (e.g., from a microphone) and returns partial/final transcription results as an async iterable.What's included
New files
src/openai/audioStreamingClient.ts— Main streaming client withstart(),pushAudioData(),getTranscriptionStream(),stop(),dispose()src/openai/audioStreamingTypes.ts—AudioStreamTranscriptionResultandCoreErrorResponseinterfaces,tryParseCoreError()helpertest/openai/audioStreamingClient.test.ts— 16 Mocha/Chai unit tests for types, settings, and client state guards - Donot included, need to wait fordllready and test it.Modified files
src/imodel.ts— AddedcreateAudioStreamingClient()to interfacesrc/model.ts— Delegates toselectedVariant.createAudioStreamingClient()src/modelVariant.ts— Implementation (createsnew AudioStreamingClient(modelId, coreInterop))src/index.ts— ExportsAudioStreamingClient,StreamingAudioSettings,AudioStreamTranscriptionResult,CoreErrorResponseDesign highlights
AsyncQueue<T>serializes audio pushes from any context (safe for mic callbacks) and provides backpressure. Mirrors C#'sChannel<T>pattern.Object.freeze()d atstart(), immutable during the sessionpushAudioData()copies the inputUint8Arraybefore queueing, safe when caller reuses buffersstop()completes the push queue, waits for the push loop to drain, then calls native stopdispose()wrapsstop()in try/catch, never throwsNative core dependency
This PR adds the JS SDK surface. The 3 native commands (
audio_stream_start,audio_stream_push,audio_stream_stop) must be implemented inMicrosoft.AI.Foundry.Local.Corebefore integration testing. The code compiles with zero TypeScript errors without the native library.Native contract spec: See
audio-streaming-codex.mdin the C# SDK for the full production contract.Testing
Parity with C# SDK
This implementation mirrors the C#
OpenAIAudioStreamingClient(branchruiren/audio-streaming-support-sdk) with identical logic:start → push → getStream → stopRelated docs
https://microsoft-my.sharepoint.com/:w:/p/ruiren/IQBhsuMcvAByTqaVZhU7Xn7tAWgVj-EHvJYW2seiItfI2V0?e=HsmT7w