From c7986d9d44ae13ef3179a93ee68b5b716bf3e47e Mon Sep 17 00:00:00 2001 From: Evan Liu Date: Tue, 26 Aug 2025 17:03:09 -0700 Subject: [PATCH 1/2] Update on-device explainer with changes + languages --- explainers/on-device-speech-recognition.md | 92 +++++++++++++++------- 1 file changed, 63 insertions(+), 29 deletions(-) diff --git a/explainers/on-device-speech-recognition.md b/explainers/on-device-speech-recognition.md index 9991bda..a5ba841 100644 --- a/explainers/on-device-speech-recognition.md +++ b/explainers/on-device-speech-recognition.md @@ -31,30 +31,61 @@ Some websites would only adopt the Web Speech API if it meets strict performance ### 3. Educational website (e.g. khanacademy.org) Applications that need to function in unreliable or offline network conditions—such as voice-based productivity tools, educational software, or accessibility features—benefit from on-device speech recognition. This enables uninterrupted functionality during flights, remote travel, or in areas with limited connectivity. When on-device recognition is unavailable, a website can choose to hide the UI or gracefully degrade functionality to maintain a coherent user experience. -## New Methods +## New API Components -### 1. `Promise availableOnDevice(DOMString lang)` -This method checks if on-device speech recognition is available for a specific language. Developers can use this to determine whether to enable features that require on-device speech recognition. +This enhancement introduces one new attribute to the `SpeechRecognition` interface and two new static methods for managing on-device capabilities. + +### 1. `processLocally` Attribute +The `processLocally` boolean attribute on a `SpeechRecognition` instance allows developers to require that speech recognition be performed locally on the user's device. + +- When set to `true`, the recognition session **must** be processed locally. If on-device recognition is not available for the specified language, the session will fail with a `service-not-allowed` error. +- When `false` (the default), the user agent is free to use either local or cloud-based recognition. #### Example Usage ```javascript -const lang = 'en-US'; -SpeechRecognition.availableOnDevice(lang).then((available) => { - if (available) { - console.log(`On-device speech recognition is available for ${lang}.`); - } else { - console.log(`On-device speech recognition is not available for ${lang}.`); - } +const recognition = new SpeechRecognition(); +recognition.lang = 'en-US'; +recognition.processLocally = true; // Require on-device speech recognition. + +recognition.onerror = (event) => { + if (event.error === 'service-not-allowed') { + console.error('On-device recognition is not available for the selected language, or the request was denied.'); + } +}; + +recognition.start(); +``` + +### 2. `Promise available(SpeechRecognitionOptions options)` +The static `SpeechRecognition.available(options)` method allows developers to check the availability of speech recognition for a given set of languages and processing preferences. It returns a `Promise` that resolves with an `AvailabilityStatus` string. + +#### Example Usage +```javascript +const options = { + langs: ['en-US'], + processLocally: true // Check for on-device availability +}; + +SpeechRecognition.available(options).then((status) => { + console.log(`On-device availability for ${options.langs.join(', ')}: ${status}`); + if (status === 'available') { + console.log('Ready to use on-device recognition.'); + } else if (status === 'downloadable') { + console.log('On-device recognition can be installed.'); + } }); ``` -### 2. `Promise installOnDevice(DOMString[] lang)` +### 2. `Promise install(SpeechRecognitionOptions options)` This method install the resources required for on-device speech recognition for the given BCP-47 language codes. The installation process may download and configure necessary language models. #### Example Usage ```javascript -const lang = 'en-US'; -SpeechRecognition.installOnDevice([lang]).then((success) => { +const options = { + langs: ['en-US'], + processLocally: true +}; +SpeechRecognition.install(options).then((success) => { if (success) { console.log('On-device speech recognition resources installed successfully.'); } else { @@ -63,22 +94,25 @@ SpeechRecognition.installOnDevice([lang]).then((success) => { }); ``` -## New Attribute - -### 1. `mode` attribute in the `SpeechRecognition` interface -The `mode` attribute in the `SpeechRecognition` interface defines how speech recognition should behave when starting a session. - -#### `SpeechRecognitionMode` Enum - -- **"on-device-preferred"**: Use on-device speech recognition if available. If not, fall back to cloud-based speech recognition. -- **"on-device-only"**: Only use on-device speech recognition. If it's unavailable, throw an error. - -#### Example Usage -```javascript -const recognition = new SpeechRecognition(); -recognition.mode = "ondevice-only"; // Only use on-device speech recognition. -recognition.start(); -``` +## Supported languages +The availability of on-device speech recognition languages is user-agent dependent. As an example, Google Chrome supports the following languages for on-device recognition: +* de-DE (German, Germany) +* en-US (English, United States) +* es-ES (Spanish, Spain) +* fr-FR (French, France) +* hi-IN (Hindi, India) +* id-ID (Indonesian, Indonesia) +* it-IT (Italian, Italy) +* ja-JP (Japanese, Japan) +* ko-KR (Korean, South Korea) +* pl-PL (Polish, Poland) +* pt-BR (Portuguese, Brazil) +* ru-RU (Russian, Russia) +* th-TH (Thai, Thailand) +* tr-TR (Turkish, Turkey) +* vi-VN (Vietnamese, Vietnam) +* zh-CN (Chinese, Mandarin, Simplified) +* zh-TW (Chinese, Mandarin, Traditional) ## Privacy considerations To reduce the risk of fingerprinting, user agents must implementing privacy-preserving countermeasures. The Web Speech API will employ the same masking techniques used by the [Web Translation API](https://github.com/webmachinelearning/writing-assistance-apis/pull/47). From a47d2d5344c9d5c1df7f702ea577fae3eb9310f8 Mon Sep 17 00:00:00 2001 From: Evan Liu Date: Tue, 26 Aug 2025 17:03:09 -0700 Subject: [PATCH 2/2] Update on-device explainer with changes + languages --- explainers/on-device-speech-recognition.md | 97 ++++++++++++---------- 1 file changed, 55 insertions(+), 42 deletions(-) diff --git a/explainers/on-device-speech-recognition.md b/explainers/on-device-speech-recognition.md index a5ba841..a1e24dc 100644 --- a/explainers/on-device-speech-recognition.md +++ b/explainers/on-device-speech-recognition.md @@ -11,7 +11,7 @@ The Web Speech API is a powerful browser feature that enables applications to pe To address these issues, we introduce **on-device speech recognition capabilities** as part of the Web Speech API. This enhancement allows speech recognition to run locally on user devices, providing a faster, more private, and offline-compatible experience. ## Why Use On-Device Speech Recognition? - + ### 1. **Privacy** On-device processing ensures that neither raw audio nor transcriptions leave the user's device, enhancing data security and user trust. @@ -20,76 +20,89 @@ Local processing reduces latency, providing a smoother and faster user experienc ### 3. **Offline Functionality** Applications can offer speech recognition capabilities even without an active internet connection, increasing their utility in remote or low-connectivity environments. +## New API Members -## Example use cases -### 1. Company with data residency requirements -Websites with strict data residency requirements (i.e., regulatory, legal, or company policy) can ensure that audio data remains on the user's device and is not sent over the network for processing. This is particularly crucial for compliance with regulations like GDPR, which considers voice as personally identifiable information (PII) as voice recordings can reveal information about an individual's gender, ethnic origin, or even potential health conditions. On-device processing significantly enhances user privacy by minimizing the exposure of sensitive voice data. +This enhancement introduces new members to the Web Speech API to support on-device recognition: a dictionary for configuration, an instance attribute, and static methods for managing capabilities. -### 2. Video conferencing service with strict performance requirements (e.g. meet.google.com) -Some websites would only adopt the Web Speech API if it meets strict performance requirements. On-device speech recognition may provide better accuracy and latency as well as provide additional features (e.g. contextual biasing) that may not be available by the cloud-based service used by the user agent. In the event on-device speech recognition is not available, these websites may elect to use an alternative cloud-based speech recognition provider that meet these requirements instead of the default one provided by the user agent. +### `SpeechRecognitionOptions` Dictionary -### 3. Educational website (e.g. khanacademy.org) -Applications that need to function in unreliable or offline network conditions—such as voice-based productivity tools, educational software, or accessibility features—benefit from on-device speech recognition. This enables uninterrupted functionality during flights, remote travel, or in areas with limited connectivity. When on-device recognition is unavailable, a website can choose to hide the UI or gracefully degrade functionality to maintain a coherent user experience. +This dictionary is used to configure speech recognition preferences, both for individual sessions and for querying or installing capabilities. -## New API Components +It includes the following members: -This enhancement introduces one new attribute to the `SpeechRecognition` interface and two new static methods for managing on-device capabilities. +- `langs`: A required sequence of `DOMString` representing BCP-47 language tags (e.g., `['en-US']`). +- `processLocally`: A boolean that, if `true`, instructs the recognition to be performed on-device. If `false` (the default), any available recognition method (cloud-based or on-device) may be used. -### 1. `processLocally` Attribute -The `processLocally` boolean attribute on a `SpeechRecognition` instance allows developers to require that speech recognition be performed locally on the user's device. -- When set to `true`, the recognition session **must** be processed locally. If on-device recognition is not available for the specified language, the session will fail with a `service-not-allowed` error. -- When `false` (the default), the user agent is free to use either local or cloud-based recognition. +```idl +dictionary SpeechRecognitionOptions { + required sequence langs; // BCP-47 language tags + boolean processLocally = false; // Instructs the recognition to be performed on-device. If `false` (default), any available recognition method may be used. +}; +``` #### Example Usage ```javascript const recognition = new SpeechRecognition(); -recognition.lang = 'en-US'; -recognition.processLocally = true; // Require on-device speech recognition. - -recognition.onerror = (event) => { - if (event.error === 'service-not-allowed') { - console.error('On-device recognition is not available for the selected language, or the request was denied.'); - } +recognition.options = { + langs: ['en-US'], + processLocally: true }; - recognition.start(); ``` -### 2. `Promise available(SpeechRecognitionOptions options)` -The static `SpeechRecognition.available(options)` method allows developers to check the availability of speech recognition for a given set of languages and processing preferences. It returns a `Promise` that resolves with an `AvailabilityStatus` string. +## Example use cases +### 1. Company with data residency requirements +Websites with strict data residency requirements (i.e., regulatory, legal, or company policy) can ensure that audio data remains on the user's device and is not sent over the network for processing. This is particularly crucial for compliance with regulations like GDPR, which considers voice as personally identifiable information (PII) as voice recordings can reveal information about an individual's gender, ethnic origin, or even potential health conditions. On-device processing significantly enhances user privacy by minimizing the exposure of sensitive voice data. + +### 2. Video conferencing service with strict performance requirements (e.g. meet.google.com) +Some websites would only adopt the Web Speech API if it meets strict performance requirements. On-device speech recognition may provide better accuracy and latency as well as provide additional features (e.g. contextual biasing) that may not be available by the cloud-based service used by the user agent. In the event on-device speech recognition is not available, these websites may elect to use an alternative cloud-based speech recognition provider that meet these requirements instead of the default one provided by the user agent. + +### 3. Educational website (e.g. khanacademy.org) +Applications that need to function in unreliable or offline network conditions—such as voice-based productivity tools, educational software, or accessibility features—benefit from on-device speech recognition. This enables uninterrupted functionality during flights, remote travel, or in areas with limited connectivity. When on-device recognition is unavailable, a website can choose to hide the UI or gracefully degrade functionality to maintain a coherent user experience. + +## New API Components + +### 1. `static Promise SpeechRecognition.available(SpeechRecognitionOptions options)` +This static method checks the availability of speech recognition capabilities matching the provided `SpeechRecognitionOptions`. + +The method returns a `Promise` that resolves to an `AvailabilityStatus` enum string: +- `"available"`: Ready to use according to the specified options. +- `"downloadable"`: Not currently available, but resources (e.g., language packs for on-device) can be downloaded. +- `"downloading"`: Resources are currently being downloaded. +- `"unavailable"`: Not available and not downloadable. #### Example Usage ```javascript -const options = { - langs: ['en-US'], - processLocally: true // Check for on-device availability -}; +// Check availability for on-device English (US) +const options = { langs: ['en-US'], processLocally: true }; SpeechRecognition.available(options).then((status) => { - console.log(`On-device availability for ${options.langs.join(', ')}: ${status}`); - if (status === 'available') { - console.log('Ready to use on-device recognition.'); - } else if (status === 'downloadable') { - console.log('On-device recognition can be installed.'); - } + console.log(`Speech recognition status for ${options.langs.join(', ')} (on-device): ${status}.`); + if (status === 'available') { + console.log('Ready to use on-device speech recognition.'); + } else if (status === 'downloadable') { + console.log('Resources are downloadable. Call install() if needed.'); + } else if (status === 'downloading') { + console.log('Resources are currently downloading.'); + } else { + console.log('Not available for on-device speech recognition.'); + } }); ``` ### 2. `Promise install(SpeechRecognitionOptions options)` -This method install the resources required for on-device speech recognition for the given BCP-47 language codes. The installation process may download and configure necessary language models. +This method installs the resources required for speech recognition matching the provided `SpeechRecognitionOptions`. The installation process may download and configure necessary language models. #### Example Usage ```javascript -const options = { - langs: ['en-US'], - processLocally: true -}; +// Install on-device resources for English (US) +const options = { langs: ['en-US'], processLocally: true }; SpeechRecognition.install(options).then((success) => { if (success) { - console.log('On-device speech recognition resources installed successfully.'); + console.log(`On-device speech recognition resources for ${options.langs.join(', ')} installed successfully.`); } else { - console.error('Unable to install on-device speech recognition.'); + console.error(`Unable to install on-device speech recognition resources for ${options.langs.join(', ')}. This could be due to unsupported languages or download issues.`); } }); ``` @@ -115,7 +128,7 @@ The availability of on-device speech recognition languages is user-agent depende * zh-TW (Chinese, Mandarin, Traditional) ## Privacy considerations -To reduce the risk of fingerprinting, user agents must implementing privacy-preserving countermeasures. The Web Speech API will employ the same masking techniques used by the [Web Translation API](https://github.com/webmachinelearning/writing-assistance-apis/pull/47). +To reduce the risk of fingerprinting, user agents must implement privacy-preserving countermeasures. The Web Speech API will employ the same masking techniques used by the [Web Translation API](https://github.com/webmachinelearning/writing-assistance-apis/pull/47). ## Conclusion The addition of on-device speech recognition capabilities to the Web Speech API marks a significant step forward in creating more private, performant, and accessible web applications. By leveraging these new methods, developers can enhance user experiences while addressing key concerns around privacy and connectivity. \ No newline at end of file