Voice agents: integrations-first overview + Voice SDK page (hide Flow)#209
Voice agents: integrations-first overview + Voice SDK page (hide Flow)#209ArchieMcM234 wants to merge 1 commit intomainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
Refactors the Voice agents documentation to emphasize partner integrations as the fastest onboarding path, while preserving and expanding Voice SDK documentation on a dedicated page.
Changes:
- Reworked the Voice agents overview to be integrations-first and added integration cards (Vapi/LiveKit/Pipecat).
- Renamed/expanded the former “Features” doc into a dedicated Voice SDK page and updated the Voice agents sidebar accordingly.
- Adjusted global sidebar ordering to place Voice agents after Text to speech.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
sidebars.ts |
Reorders the top-level sidebar list to move “Voice agents” below “Text to speech”. |
docs/voice-agents/overview.mdx |
Replaces the overview content with an integrations-first entry point and links to the Voice SDK page. |
docs/voice-agents/sidebar.ts |
Updates the Voice agents sidebar to show only Overview + Voice SDK (hides Flow). |
docs/voice-agents/voice-sdk.mdx |
Introduces a richer Voice SDK landing page (getting started, quickstart, presets, and config guidance). |
Comments suppressed due to low confidence (3)
docs/voice-agents/voice-sdk.mdx:10
pythonVoiceConfigSerializationis imported but never used in this MDX file. Removing the unused raw import will avoid unnecessary bundling and keeps the page easier to maintain.
docs/voice-agents/voice-sdk.mdx:101- The code sample comment says "interruptable"; the correct spelling is "interruptible".
docs/voice-agents/voice-sdk.mdx:137 - The snippet for listing presets references
VoiceAgentConfigPreset.list_presets()without showing whereVoiceAgentConfigPresetcomes from. Consider adding the relevant import (or fully qualifying it) so the example is copy/paste runnable.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| Here's how to stream microphone audio to the Voice Agent and transcribe finalised segments of speech, with speaker ID: | ||
|
|
||
| ```python |
There was a problem hiding this comment.
Could we move this to an external file and import it?
There was a problem hiding this comment.
Yes, this slipped through the cracks -
This is fixed in the next pr which refactores the voice sdk page, into (hopefully) quite a comprehensive page.
Just as a note it is stacked as a branch off of this branch.
lgavincrl
left a comment
There was a problem hiding this comment.
I think the voice agents section should be moved to :
"Voice Agents, Integrations and SDK's" - this way, it's a little more 'together' - it's a SDK and encompasses the integration partners.
Otherwise, nice
| If you’re building it yourself, you can also use our Voice SDK. Integrations are built on top of the Voice SDK, which provides features optimized for conversational AI. | ||
|
|
||
| ### Voice SDK vs Realtime SDK | ||
| If you’re building an integration and want to work with us, contact support. |
There was a problem hiding this comment.
Link to contact support like this:
[contact support](https://support.speechmatics.com).
There was a problem hiding this comment.
I'd add this at the bottom of the page as a 'next steps'
| Use the Voice SDK when: | ||
| ## Features | ||
|
|
||
| - Building conversational AI or voice agents | ||
| - You need automatic turn detection | ||
| - You want speaker-focused transcription | ||
| - You need ready-to-use presets for common scenarios | ||
| Speechmatics provides building blocks you can use through integrations and the Voice SDK. | ||
|
|
||
| Use the Realtime SDK when: | ||
| It includes: | ||
|
|
||
| - You need the raw stream of word-by-word transcription data | ||
| - Building custom segmentation logic | ||
| - You want fine-grained control over every event | ||
| - Processing audio files or custom workflows | ||
| - **Turn detection**: detect when a speaker has finished talking. | ||
| - **Intelligent segmentation**: group partial transcripts into clean, speaker-attributed segments. | ||
| - **Diarization**: identify and label different speakers. | ||
| - **Speaker focus**: focus on or ignore specific speakers in multi-speaker scenarios. | ||
| - **Preset configurations**: start quickly with ready-to-use settings. | ||
| - **Structured events**: work with clean segments instead of raw word-level events. |
There was a problem hiding this comment.
Are we diverting from the "when to use x or y" scenario? The presentation of the differing use cases for Rt vs voice sdk's was something that was highlighted as important previously?
|
|
||
| # Voice SDK overview | ||
| The Voice SDK builds on our Realtime API to provide additional features optimized for conversational AI, using Python: | ||
| Our integration partners can be the quickest way to get a production voice agent up and running. |
There was a problem hiding this comment.
The fastest way to create a production-ready voice agent is through utilizing our integration partners:
(add link cards here perhaps?)
| - **Preset configurations**: offers ready-to-use settings for conversations, note-taking, and captions. | ||
| - **Simplified event handling**: delivers clean, structured segments instead of raw word-level events. | ||
| If you’re building it yourself, you can also use our Voice SDK. Integrations are built on top of the Voice SDK, which provides features optimized for conversational AI. | ||
|
|
There was a problem hiding this comment.
Voice agents overview
Production-ready voice agents with features optimized for conversational AI can be built using the Voice SDK, or through one of our integration partners, which are built on top of the Voice SDK:
(add link cards here perhaps? and add one for the voice sdk)
| - You need automatic turn detection | ||
| - You want speaker-focused transcription | ||
| - You need ready-to-use presets for common scenarios | ||
| Speechmatics provides building blocks you can use through integrations and the Voice SDK. |
There was a problem hiding this comment.
I'd phrase this differently - try to introduce our features, and they optimized for conversational AI to enhance voice agents
|
|
||
| ### Voice SDK vs Realtime SDK | ||
|
|
||
| Use the Voice SDK when: |
There was a problem hiding this comment.
Perhaps, also add a when to use integrations?
|
|
||
| - Building conversational AI or voice agents | ||
| - You need automatic turn detection | ||
| - You want speaker-focused transcription | ||
| - You need ready-to-use presets for common scenarios | ||
|
|
||
| Use the Realtime SDK when: | ||
|
|
||
| - You need the raw stream of word-by-word transcription data | ||
| - Building custom segmentation logic | ||
| - You want fine-grained control over every event | ||
| - Processing audio files or custom workflows | ||
|
|
||
| ## Getting started | ||
|
|
||
| ### 1. Create an API key |
There was a problem hiding this comment.
Perhaps add this as a 'Voice agents > Quickstart' page, see the Rt Stt quickstart for example: https://docs.speechmatics.com/speech-to-text/realtime/quickstart
|
|
||
| ### 3. Quickstart | ||
|
|
||
| Here's how to stream microphone audio to the Voice Agent and transcribe finalised segments of speech, with speaker ID: |
There was a problem hiding this comment.
Is this using very basic configs? Or is this using the presets config? I think it would be good to say
|
|
||
| # Audio configuration | ||
| SAMPLE_RATE = 16000 # Hz | ||
| CHUNK_SIZE = 160 # Samples per read |
There was a problem hiding this comment.
Samples per read -
state this in a really basic way. It needs to be crystal clear, with no ambiguity as to what this config is / what it does
| @@ -45,7 +199,8 @@ Silence duration in seconds to trigger turn end. | |||
| Maximum delay before forcing turn end. | |||
|
|
|||
| `max_delay` (float, default: 0.7) | |||
| Maximum transcription delay for word emission. | |||
| Maximum transcription delay for word emission. | |||
| Defaults to 0.7 seconds, but when using turn detection we recommend 1.0s for better accuracy. Turn detection will ensure finalisation latency is not affected. | |||
|
|
|||
| ### Speaker configuration | |||
| `speaker_sensitivity` (float, default: 0.5) | |||
There was a problem hiding this comment.
I'd put this on a different page (if you don't use a separate quickstart), or in the tab layout - I don't think its great to have super lengthy pages
Summary
This PR refactors the Voice agents docs to make it easier for users to get started via integration partners, while keeping the Voice SDK documentation available in a dedicated page.
What changed
Featurespage tovoice-sdk, and expanding it with:Navigation / IA
Voice agents sidebar is now:
Notes / follow-ups
Testing
Files changed
docs/sidebars.ts