Skip to content

Comments

Voice agents: integrations-first overview + Voice SDK page (hide Flow)#209

Open
ArchieMcM234 wants to merge 1 commit intomainfrom
deprecate-flow-integrations
Open

Voice agents: integrations-first overview + Voice SDK page (hide Flow)#209
ArchieMcM234 wants to merge 1 commit intomainfrom
deprecate-flow-integrations

Conversation

@ArchieMcM234
Copy link
Contributor

Summary

This PR refactors the Voice agents docs to make it easier for users to get started via integration partners, while keeping the Voice SDK documentation available in a dedicated page.

What changed

  • Hid “Flow” from the Voice agents sidebar (content remains in the repo; it’s just no longer shown in navigation).
  • Updated the Voice agents overview to be integrations-first, including cards linking to:
    • Vapi
    • LiveKit
    • Pipecat
  • Created a dedicated Voice SDK page by renaming the former Features page to voice-sdk, and expanding it with:
    • Getting started + quickstart
    • Presets overview
    • Custom config examples
    • Full configuration reference

Navigation / IA

Voice agents sidebar is now:

  • Overview
  • Voice SDK

Notes / follow-ups

  • This keeps the SDK docs in-repo for now; we may want to replace parts of the Voice SDK page with a link to the SDK GitHub README in future.
  • I intend to revisit the Voice SDK page next to make it more comprehensive in the mean time (expand coverage, improve structure, and fill remaining gaps).
  • Flow content is intentionally retained but not exposed in navigation.

Testing

  • Verified docs compile locally after updating sidebar doc IDs and internal links.

Files changed

  • docs/docs/voice-agents/overview.mdx
  • docs/docs/voice-agents/sidebar.ts
  • docs/docs/voice-agents/features.mdx -> docs/docs/voice-agents/voice-sdk.mdx
  • docs/sidebars.ts

@vercel
Copy link

vercel bot commented Feb 20, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment Feb 20, 2026 4:52pm

Request Review

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the Voice agents documentation to emphasize partner integrations as the fastest onboarding path, while preserving and expanding Voice SDK documentation on a dedicated page.

Changes:

  • Reworked the Voice agents overview to be integrations-first and added integration cards (Vapi/LiveKit/Pipecat).
  • Renamed/expanded the former “Features” doc into a dedicated Voice SDK page and updated the Voice agents sidebar accordingly.
  • Adjusted global sidebar ordering to place Voice agents after Text to speech.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
sidebars.ts Reorders the top-level sidebar list to move “Voice agents” below “Text to speech”.
docs/voice-agents/overview.mdx Replaces the overview content with an integrations-first entry point and links to the Voice SDK page.
docs/voice-agents/sidebar.ts Updates the Voice agents sidebar to show only Overview + Voice SDK (hides Flow).
docs/voice-agents/voice-sdk.mdx Introduces a richer Voice SDK landing page (getting started, quickstart, presets, and config guidance).
Comments suppressed due to low confidence (3)

docs/voice-agents/voice-sdk.mdx:10

  • pythonVoiceConfigSerialization is imported but never used in this MDX file. Removing the unused raw import will avoid unnecessary bundling and keeps the page easier to maintain.
    docs/voice-agents/voice-sdk.mdx:101
  • The code sample comment says "interruptable"; the correct spelling is "interruptible".
    docs/voice-agents/voice-sdk.mdx:137
  • The snippet for listing presets references VoiceAgentConfigPreset.list_presets() without showing where VoiceAgentConfigPreset comes from. Consider adding the relevant import (or fully qualifying it) so the example is copy/paste runnable.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


Here's how to stream microphone audio to the Voice Agent and transcribe finalised segments of speech, with speaker ID:

```python
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this to an external file and import it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this slipped through the cracks -
This is fixed in the next pr which refactores the voice sdk page, into (hopefully) quite a comprehensive page.

Just as a note it is stacked as a branch off of this branch.

Copy link
Contributor

@lgavincrl lgavincrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the voice agents section should be moved to :

"Voice Agents, Integrations and SDK's" - this way, it's a little more 'together' - it's a SDK and encompasses the integration partners.

Otherwise, nice

If you’re building it yourself, you can also use our Voice SDK. Integrations are built on top of the Voice SDK, which provides features optimized for conversational AI.

### Voice SDK vs Realtime SDK
If you’re building an integration and want to work with us, contact support.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to contact support like this:
[contact support](https://support.speechmatics.com).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add this at the bottom of the page as a 'next steps'

Comment on lines -24 to +26
Use the Voice SDK when:
## Features

- Building conversational AI or voice agents
- You need automatic turn detection
- You want speaker-focused transcription
- You need ready-to-use presets for common scenarios
Speechmatics provides building blocks you can use through integrations and the Voice SDK.

Use the Realtime SDK when:
It includes:

- You need the raw stream of word-by-word transcription data
- Building custom segmentation logic
- You want fine-grained control over every event
- Processing audio files or custom workflows
- **Turn detection**: detect when a speaker has finished talking.
- **Intelligent segmentation**: group partial transcripts into clean, speaker-attributed segments.
- **Diarization**: identify and label different speakers.
- **Speaker focus**: focus on or ignore specific speakers in multi-speaker scenarios.
- **Preset configurations**: start quickly with ready-to-use settings.
- **Structured events**: work with clean segments instead of raw word-level events.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we diverting from the "when to use x or y" scenario? The presentation of the differing use cases for Rt vs voice sdk's was something that was highlighted as important previously?


# Voice SDK overview
The Voice SDK builds on our Realtime API to provide additional features optimized for conversational AI, using Python:
Our integration partners can be the quickest way to get a production voice agent up and running.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fastest way to create a production-ready voice agent is through utilizing our integration partners:
(add link cards here perhaps?)

} href="/integrations-and-sdks/vapi" /> } href="/integrations-and-sdks/livekit" /> } href="/integrations-and-sdks/pipecat" />

- **Preset configurations**: offers ready-to-use settings for conversations, note-taking, and captions.
- **Simplified event handling**: delivers clean, structured segments instead of raw word-level events.
If you’re building it yourself, you can also use our Voice SDK. Integrations are built on top of the Voice SDK, which provides features optimized for conversational AI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Voice agents overview

Production-ready voice agents with features optimized for conversational AI can be built using the Voice SDK, or through one of our integration partners, which are built on top of the Voice SDK:

(add link cards here perhaps? and add one for the voice sdk)

- You need automatic turn detection
- You want speaker-focused transcription
- You need ready-to-use presets for common scenarios
Speechmatics provides building blocks you can use through integrations and the Voice SDK.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd phrase this differently - try to introduce our features, and they optimized for conversational AI to enhance voice agents


### Voice SDK vs Realtime SDK

Use the Voice SDK when:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, also add a when to use integrations?

Comment on lines +29 to +44

- Building conversational AI or voice agents
- You need automatic turn detection
- You want speaker-focused transcription
- You need ready-to-use presets for common scenarios

Use the Realtime SDK when:

- You need the raw stream of word-by-word transcription data
- Building custom segmentation logic
- You want fine-grained control over every event
- Processing audio files or custom workflows

## Getting started

### 1. Create an API key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add this as a 'Voice agents > Quickstart' page, see the Rt Stt quickstart for example: https://docs.speechmatics.com/speech-to-text/realtime/quickstart


### 3. Quickstart

Here's how to stream microphone audio to the Voice Agent and transcribe finalised segments of speech, with speaker ID:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this using very basic configs? Or is this using the presets config? I think it would be good to say


# Audio configuration
SAMPLE_RATE = 16000 # Hz
CHUNK_SIZE = 160 # Samples per read
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Samples per read -
state this in a really basic way. It needs to be crystal clear, with no ambiguity as to what this config is / what it does

Comment on lines 117 to 206
@@ -45,7 +199,8 @@ Silence duration in seconds to trigger turn end.
Maximum delay before forcing turn end.

`max_delay` (float, default: 0.7)
Maximum transcription delay for word emission.
Maximum transcription delay for word emission.
Defaults to 0.7 seconds, but when using turn detection we recommend 1.0s for better accuracy. Turn detection will ensure finalisation latency is not affected.

### Speaker configuration
`speaker_sensitivity` (float, default: 0.5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put this on a different page (if you don't use a separate quickstart), or in the tab layout - I don't think its great to have super lengthy pages

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants