fix: stabilize model-level TTS routing and add model TTS settings by zhangmo8 · Pull Request #1632 · ThinkInAIXYZ/deepchat

zhangmo8 · 2026-05-15T10:15:36Z

Summary

This PR finishes and hardens model-level TTS support across runtime, model DB inference, and settings UI.

Runtime / Routing

normalize TTS model IDs so prefixed IDs (e.g. openai/tts-1) are recognized
expand chat-audio TTS detection to include Xiaomi alias prefix
improve Pattern B chat-completions request by adding modalities: ["text", "audio"]
improve Pattern B response parsing with fallback path extraction

Model Type / Endpoint Inference

support tts in provider-model type inference paths
map inferred ModelType.TTS to ApiEndpointType.AudioSpeech
keep provider-managed model config sync aligned with TTS endpoint behavior

Settings UI

add model-level TTS settings fields (voice, responseFormat, speed, instructions)
integrate TTS settings into ModelConfigDialog
add TTS model type option and audio-speech endpoint option in model config
auto-sync API endpoint with selected model type (chat / image / audio-speech)
improve TTS type labeling in provider model list

Model DB

mark relevant aihubmix TTS models as type: "tts"
document skip for built-in xiaomimimo provider entry (not present)

Verification

pnpm run typecheck
pnpm run format
pnpm run i18n
pnpm run lint

All checks pass locally.

Summary by CodeRabbit

Release Notes

New Features
- Added text-to-speech (TTS) model support with configurable settings including voice selection, audio format, speed, and synthesis instructions.
- Integrated TTS model detection and routing for enhanced audio generation capabilities.
- Added new TTS provider models (OpenAI TTS, Google Gemini audio variants) to the model database.

coderabbitai · 2026-05-15T10:15:53Z

📝 Walkthrough

Walkthrough

This PR implements unified TTS provider support as a first-class model capability. It introduces ModelType.TTS and ApiEndpointType.AudioSpeech into shared contracts, adds provider routing to recognize TTS models, implements dual-pattern runtime execution (OpenAI-compatible /audio/speech and /chat/completions-with-audio), extends the frontend with TTS model detection and configuration UI, and populates the model database with TTS model entries.

Changes

Unified TTS Provider

Layer / File(s)	Summary
Shared type and settings foundations `src/shared/model.ts`, `src/shared/ttsSettings.ts`, `src/shared/contracts/common.ts`, `src/shared/contracts/domainSchemas.ts`, `src/shared/types/model-db.ts`, `src/shared/types/presenters/legacy.presenters.d.ts`	`ModelType.TTS` and `ApiEndpointType.AudioSpeech` enums are added; `TtsSettings` interface with `voice`, `responseFormat`, `speed`, and `instructions` fields; helpers for model ID classification, MIME-type mapping, and settings normalization; model-db schema extended to parse `type: 'tts'`; `ModelConfig` contract gains optional `tts` field.
Backend presenter model routing `src/main/presenter/configPresenter/index.ts`, `src/main/presenter/configPresenter/modelConfig.ts`	Provider models with `type: 'tts'` are inferred as `ModelType.TTS`; `buildConfigFromProviderModel` maps `ModelType.TTS` to `ApiEndpointType.AudioSpeech` endpoint.
Provider TTS capability detection `src/main/presenter/llmProviderPresenter/providers/aiSdkProvider.ts`	`shouldUseOpenAITtsRoute` predicate identifies OpenAI-compatible TTS models; `shouldUseTts` callback is wired into `AiSdkRuntimeContext` for compatible non-cloud providers; "new-api" model type recognition extended to classify TTS patterns; computed `apiEndpoint` maps TTS to `AudioSpeech`.
Runtime TTS execution `src/main/presenter/llmProviderPresenter/aiSdk/runtime.ts`	`AiSdkRuntimeContext` gains optional `shouldUseTts` hook; `runAiSdkCoreStream` branches to TTS when enabled: extracts text from last user message, executes either Pattern A (`/v1/audio/speech`) or Pattern B (`/v1/chat/completions` with audio), encodes response as base64 data URL, caches via device presenter, emits `image_data` stream event with cached audio, and terminates with stop event.
Frontend model detection `src/renderer/src/composables/useModelTypeDetection.ts`	`useModelTypeDetection` composable extended to recognize `'tts'` model type and expose `isTtsModel` computed boolean.
Frontend configuration UI `src/renderer/src/components/settings/TtsSettingsFields.vue`, `src/renderer/src/components/settings/ModelConfigDialog.vue`, `src/renderer/settings/components/ProviderModelList.vue`	New `TtsSettingsFields.vue` component renders voice, response format, speed, and instructions inputs with normalization; `ModelConfigDialog` conditionally renders TTS settings, adds `tts` type option, selects `audio-speech` endpoint for TTS, synchronizes endpoint based on model type, and normalizes settings on save; `ProviderModelList` adds TTS icon and i18n label.
Model database and documentation `resources/model-db/providers.json`, `docs/features/unified-tts-provider/*`	TTS model entries added: `tts-1`, `tts-1-hd`, `gpt-4o-mini-tts`, `gemini-2.5-flash-preview-tts`, `gemini-2.5-pro-preview-tts` with modalities, limits, and metadata; feature specification, implementation plan, and task checklist documentation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

ThinkInAIXYZ/deepchat#1449: The main PR's unified TTS implementation directly extends the AI-SDK runtime by adding a new shouldUseTts hook and inserting a TTS routing path into runAiSdkCoreStream, which is part of the same AI-SDK runtime migration.
ThinkInAIXYZ/deepchat#1244: Main PR extends the existing apiEndpoint plumbing by updating ModelConfig/ModelConfigDialog routing and endpoint handling to add TTS-specific audio-speech behavior alongside chat/image endpoint selection.
ThinkInAIXYZ/deepchat#1291: Both PRs connect at the message/audio plumbing level; the main PR adds model-level TTS execution, while the retrieved PR implements renderer logic to emit and render audio blocks for voice/TTS output.

Suggested reviewers

zerob13

Poem

🐰 Hops through audio dreams so bright,
Where text meets voice in flight,
TTS now speaks with model grace,
A melody in every place! 🎵

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 52.63% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the core changes: stabilizing model-level TTS routing and adding TTS settings support across runtime, inference, and UI layers.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/model-level-tts-runtime-and-settings

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

resources/model-db/providers.json (1)

181588-181610: ⚡ Quick win

Normalize gpt-4o-mini-tts field shape with peer TTS entries.

Line 181588 onward omits limit and open_weights, while neighboring TTS entries include them. Keeping a consistent record shape reduces downstream null-guard branching.

Proposed diff

         {
           "id": "gpt-4o-mini-tts",
           "name": "gpt-4o-mini-tts",
           "display_name": "gpt-4o-mini-tts",
           "modalities": {
             "input": [
               "text"
             ],
             "output": [
               "audio"
             ]
           },
+          "limit": {
+            "context": 8192,
+            "output": 8192
+          },
           "temperature": false,
           "tool_call": false,
           "reasoning": {
             "supported": false
           },
           "attachment": false,
+          "open_weights": false,
           "cost": {
             "input": 0.48,
             "output": 0.96
           },
           "type": "tts"
         },

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@resources/model-db/providers.json` around lines 181588 - 181610, The JSON
entry for the TTS model with id/name "gpt-4o-mini-tts" is missing the standard
fields "limit" and "open_weights" used by other TTS entries; update the object
for "gpt-4o-mini-tts" to include the same "limit" structure (e.g.,
requests/characters/hour or whatever shape peers use) and the "open_weights"
boolean/metadata key with the same defaults as neighboring TTS entries so the
record shape matches peers and avoids extra null checks in consumers.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/main/presenter/llmProviderPresenter/aiSdk/runtime.ts`:
- Around line 313-318: The audio extraction is brittle: update the logic around
firstMessage and audioData to defensively check both paths and handle missing
fields—inspect firstMessage.audio?.data first, then iterate firstMessage.content
(if Array.isArray) and return the first item where item?.type === 'audio' &&
item?.audio?.data exists; if none found, also consider item?.audio (in case data
is top-level) or item?.data as fallback before throwing. Modify the code around
the audioData computation (referencing firstMessage and audioData) to perform
these guarded checks and only throw the Error('TTS response missing audio data
in choices[0].message.audio.data') after all fallbacks are exhausted.

In `@src/renderer/src/components/settings/TtsSettingsFields.vue`:
- Around line 54-61: The Label and Input in TtsSettingsFields.vue are using the
wrong i18n keys (settings.model.modelConfig.timeout.label and
settings.model.modelConfig.name.placeholder); replace them with dedicated
"instructions" keys (for example settings.model.instructions.label and
settings.model.instructions.placeholder) in the two t(...) calls used by Label
and the Input's placeholder, keep the binding to t and the `@update`:model-value
handler onInstructionsInput unchanged, and then add/update those new keys in the
i18n resource files so translations are available.

---

Nitpick comments:
In `@resources/model-db/providers.json`:
- Around line 181588-181610: The JSON entry for the TTS model with id/name
"gpt-4o-mini-tts" is missing the standard fields "limit" and "open_weights" used
by other TTS entries; update the object for "gpt-4o-mini-tts" to include the
same "limit" structure (e.g., requests/characters/hour or whatever shape peers
use) and the "open_weights" boolean/metadata key with the same defaults as
neighboring TTS entries so the record shape matches peers and avoids extra null
checks in consumers.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d3fd03ba-47f7-4fee-9ec4-b6c314ffb6a4

📥 Commits

Reviewing files that changed from the base of the PR and between 5727aff and 8fa2e72.

📒 Files selected for processing (18)

docs/features/unified-tts-provider/plan.md
docs/features/unified-tts-provider/spec.md
docs/features/unified-tts-provider/tasks.md
resources/model-db/providers.json
src/main/presenter/configPresenter/index.ts
src/main/presenter/configPresenter/modelConfig.ts
src/main/presenter/llmProviderPresenter/aiSdk/runtime.ts
src/main/presenter/llmProviderPresenter/providers/aiSdkProvider.ts
src/renderer/settings/components/ProviderModelList.vue
src/renderer/src/components/settings/ModelConfigDialog.vue
src/renderer/src/components/settings/TtsSettingsFields.vue
src/renderer/src/composables/useModelTypeDetection.ts
src/shared/contracts/common.ts
src/shared/contracts/domainSchemas.ts
src/shared/model.ts
src/shared/ttsSettings.ts
src/shared/types/model-db.ts
src/shared/types/presenters/legacy.presenters.d.ts

coderabbitai · 2026-05-15T10:19:25Z

+    const audioData =
+      firstMessage?.audio?.data ??
+      firstMessage?.content?.find((item) => item?.type === 'audio')?.audio?.data
+    if (!audioData) {
+      throw new Error('TTS response missing audio data in choices[0].message.audio.data')
+    }


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Pattern B audio extraction may miss edge cases.

The fallback path at Line 315 searches for content array items with type === 'audio', but the primary path checks message.audio.data. If the response structure varies (e.g., message.content exists but item.type is not 'audio', or item.audio is missing), the extraction will fail.

🛡️ Consider a more defensive extraction

- const audioData = - firstMessage?.audio?.data ?? - firstMessage?.content?.find((item) => item?.type === 'audio')?.audio?.data + const audioData = + firstMessage?.audio?.data ?? + (Array.isArray(firstMessage?.content) + ? firstMessage.content.find((item) => item?.type === 'audio')?.audio?.data + : undefined) if (!audioData) { throw new Error('TTS response missing audio data in choices[0].message.audio.data') }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const audioData =

firstMessage?.audio?.data ??

firstMessage?.content?.find((item) => item?.type === 'audio')?.audio?.data

if (!audioData) {

throw new Error('TTS response missing audio data in choices[0].message.audio.data')

}

const audioData =

firstMessage?.audio?.data ??

(Array.isArray(firstMessage?.content)

? firstMessage.content.find((item) => item?.type === 'audio')?.audio?.data

: undefined)

if (!audioData) {

throw new Error('TTS response missing audio data in choices[0].message.audio.data')

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/presenter/llmProviderPresenter/aiSdk/runtime.ts` around lines 313 - 318, The audio extraction is brittle: update the logic around firstMessage and audioData to defensively check both paths and handle missing fields—inspect firstMessage.audio?.data first, then iterate firstMessage.content (if Array.isArray) and return the first item where item?.type === 'audio' && item?.audio?.data exists; if none found, also consider item?.audio (in case data is top-level) or item?.data as fallback before throwing. Modify the code around the audioData computation (referencing firstMessage and audioData) to perform these guarded checks and only throw the Error('TTS response missing audio data in choices[0].message.audio.data') after all fallbacks are exhausted.

coderabbitai · 2026-05-15T10:19:25Z

+    <div class="space-y-2">
+      <Label>{{ t('settings.model.modelConfig.timeout.label') }}</Label>
+      <Input
+        :model-value="tts.instructions ?? ''"
+        :placeholder="t('settings.model.modelConfig.name.placeholder')"
+        @update:model-value="onInstructionsInput"
+      />
+    </div>


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Incorrect i18n keys for instructions field.

Line 55 uses settings.model.modelConfig.timeout.label for the instructions label, and Line 58 uses settings.model.modelConfig.name.placeholder for the instructions placeholder. These keys appear to be copy-paste errors and do not match the semantic purpose of the "instructions" field.

🔧 Proposed fix to use dedicated i18n keys

<div class="space-y-2"> - <Label>{{ t('settings.model.modelConfig.timeout.label') }}</Label> + <Label>{{ t('settings.provider.voiceai.instructions.label') }}</Label> <Input :model-value="tts.instructions ?? ''" - :placeholder="t('settings.model.modelConfig.name.placeholder')" + :placeholder="t('settings.provider.voiceai.instructions.placeholder')" `@update`:model-value="onInstructionsInput" /> </div>

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

<div class="space-y-2">

<Label>{{ t('settings.model.modelConfig.timeout.label') }}</Label>

<Input

:model-value="tts.instructions ?? ''"

:placeholder="t('settings.model.modelConfig.name.placeholder')"

@update:model-value="onInstructionsInput"

/>

</div>

<div class="space-y-2">

<Label>{{ t('settings.provider.voiceai.instructions.label') }}</Label>

<Input

:model-value="tts.instructions ?? ''"

:placeholder="t('settings.provider.voiceai.instructions.placeholder')"

`@update`:model-value="onInstructionsInput"

/>

</div>

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/renderer/src/components/settings/TtsSettingsFields.vue` around lines 54 - 61, The Label and Input in TtsSettingsFields.vue are using the wrong i18n keys (settings.model.modelConfig.timeout.label and settings.model.modelConfig.name.placeholder); replace them with dedicated "instructions" keys (for example settings.model.instructions.label and settings.model.instructions.placeholder) in the two t(...) calls used by Label and the Input's placeholder, keep the binding to t and the `@update`:model-value handler onInstructionsInput unchanged, and then add/update those new keys in the i18n resource files so translations are available.

fix(tts): stabilize routing and model settings

8fa2e72

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

zerob13 merged commit f76d07f into dev May 16, 2026
4 checks passed

This was referenced May 18, 2026

fix: update TTS references and localization across components #1633

Merged

fix: enhance Gemini TTS integration and update related models and tests #1634

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: stabilize model-level TTS routing and add model TTS settings#1632

fix: stabilize model-level TTS routing and add model TTS settings#1632
zerob13 merged 1 commit into
devfrom
fix/model-level-tts-runtime-and-settings

zhangmo8 commented May 15, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 15, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 15, 2026

Uh oh!

coderabbitai Bot May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zhangmo8 commented May 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Runtime / Routing

Model Type / Endpoint Inference

Settings UI

Model DB

Verification

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhangmo8 commented May 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 15, 2026 •

edited

Loading