fix: enhance Gemini TTS integration and update related models and tests by zhangmo8 · Pull Request #1634 · ThinkInAIXYZ/deepchat

zhangmo8 · 2026-05-18T04:57:52Z

Summary by CodeRabbit

New Features
- Added Text-to-Speech support for Gemini models with audio output normalization to WAV format.
- Extended TTS compatibility to include Gemini's generateContent endpoint with voice selection capabilities.
Tests
- Added comprehensive test coverage for Gemini TTS and OpenAI-compatible chat audio TTS functionality.

coderabbitai · 2026-05-18T04:58:06Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e2d97889-f4a7-434c-a0cb-45e7a3ebc14a

📥 Commits

Reviewing files that changed from the base of the PR and between 3ebe5a5 and 3fbd8f2.

📒 Files selected for processing (4)

src/main/presenter/llmProviderPresenter/aiSdk/runtime.ts
src/main/presenter/llmProviderPresenter/providers/aiSdkProvider.ts
src/shared/ttsSettings.ts
test/main/presenter/llmProviderPresenter/aiSdkRuntime.test.ts

📝 Walkthrough

Walkthrough

This PR extends AI SDK TTS support to include Gemini's generateContent API. It adds a Gemini TTS model allowlist to shared TTS detection, unifies model routing logic, implements Gemini-specific runtime helpers (base URL normalization and PCM-to-WAV conversion), adds a new Pattern C execution path for Gemini TTS, updates Pattern B with dual messages, and provides test coverage.

Changes

Gemini Text-to-Speech Integration

Layer / File(s)	Summary
TTS Model Detection Expansion `src/shared/ttsSettings.ts`	Adds `GEMINI_GENERATE_CONTENT_TTS_MODELS` allowlist and `isGeminiGenerateContentTtsModel()` predicate; merges Gemini detection into the aggregate `isTtsModelId()` to unify TTS model identification across standard, chat-audio, and Gemini generateContent models.
TTS Routing Decision `src/main/presenter/llmProviderPresenter/providers/aiSdkProvider.ts`	Updates TTS routing to use the unified `isTtsModelId()` predicate instead of separate model-type checks, consolidating provider-level TTS decision logic.
Gemini TTS Runtime Foundation `src/main/presenter/llmProviderPresenter/aiSdk/runtime.ts` (imports, constants)	Adds imports for Gemini TTS model detection and provider context; defines default Gemini TTS voice and PCM audio parameters to support Gemini-specific execution.
Gemini TTS Helper Functions `src/main/presenter/llmProviderPresenter/aiSdk/runtime.ts` (helpers)	Implements Gemini TTS utilities: prompt construction, base URL resolution with provider-specific handling (aihubmix, /v1beta routing), and PCM-to-WAV audio normalization that wraps raw PCM into WAV format.
TTS Pattern Implementation and Routing `src/main/presenter/llmProviderPresenter/aiSdk/runtime.ts` (patterns, routing)	Updates Pattern B (chat-completions TTS) to include dual messages (user + assistant); adds Pattern C for Gemini generateContent with AUDIO response modalities; routes between Pattern C, B, and A based on model detection in `runAiSdkCoreStream`.
TTS Runtime Coverage and Test Fixtures `test/main/presenter/llmProviderPresenter/aiSdkRuntime.test.ts`	Adds Vitest cleanup; extends `providerFactory` mock with `normalizeGeminiBaseUrl`; validates chat-audio TTS with dual messages and WAV format; validates Gemini TTS with correct generateContent endpoint, headers, payload shape, and audio event emission.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

ThinkInAIXYZ/deepchat#1632: Extends model-level unified TTS routing on the same runtime and shared TTS settings code path.
ThinkInAIXYZ/deepchat#1633: Overlaps at TTS runtime pathways by modifying TTS pattern dispatch via proxy.
ThinkInAIXYZ/deepchat#1511: Shares Gemini base-URL normalization and /v1beta routing logic.

Suggested reviewers

zerob13

Poem

🐰 Hops with glee through TTS flows,
Gemini's voice now softly flows,
Pattern C with AUDIO calls,
PCM wraps in WAV's walls,
Tests confirm the chorus sings! 🎵

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: enhance Gemini TTS integration and update related models and tests' accurately summarizes the main changes: adding Gemini TTS support (Pattern C with generateContent endpoint), updating TTS model detection logic across multiple files, and adding comprehensive test coverage for both chat-audio and Gemini TTS patterns.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch gemini-tts-fix

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

fix: enhance Gemini TTS integration and update related models and tests

3fbd8f2

zhangmo8 merged commit 2b8816d into dev May 18, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: enhance Gemini TTS integration and update related models and tests#1634

fix: enhance Gemini TTS integration and update related models and tests#1634
zhangmo8 merged 1 commit into
devfrom
gemini-tts-fix

zhangmo8 commented May 18, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 18, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zhangmo8 commented May 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zhangmo8 commented May 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 18, 2026 •

edited

Loading