Skip to content

fix: enhance Gemini TTS integration and update related models and tests#1634

Merged
zhangmo8 merged 1 commit into
devfrom
gemini-tts-fix
May 18, 2026
Merged

fix: enhance Gemini TTS integration and update related models and tests#1634
zhangmo8 merged 1 commit into
devfrom
gemini-tts-fix

Conversation

@zhangmo8
Copy link
Copy Markdown
Collaborator

@zhangmo8 zhangmo8 commented May 18, 2026

Summary by CodeRabbit

  • New Features

    • Added Text-to-Speech support for Gemini models with audio output normalization to WAV format.
    • Extended TTS compatibility to include Gemini's generateContent endpoint with voice selection capabilities.
  • Tests

    • Added comprehensive test coverage for Gemini TTS and OpenAI-compatible chat audio TTS functionality.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e2d97889-f4a7-434c-a0cb-45e7a3ebc14a

📥 Commits

Reviewing files that changed from the base of the PR and between 3ebe5a5 and 3fbd8f2.

📒 Files selected for processing (4)
  • src/main/presenter/llmProviderPresenter/aiSdk/runtime.ts
  • src/main/presenter/llmProviderPresenter/providers/aiSdkProvider.ts
  • src/shared/ttsSettings.ts
  • test/main/presenter/llmProviderPresenter/aiSdkRuntime.test.ts

📝 Walkthrough

Walkthrough

This PR extends AI SDK TTS support to include Gemini's generateContent API. It adds a Gemini TTS model allowlist to shared TTS detection, unifies model routing logic, implements Gemini-specific runtime helpers (base URL normalization and PCM-to-WAV conversion), adds a new Pattern C execution path for Gemini TTS, updates Pattern B with dual messages, and provides test coverage.

Changes

Gemini Text-to-Speech Integration

Layer / File(s) Summary
TTS Model Detection Expansion
src/shared/ttsSettings.ts
Adds GEMINI_GENERATE_CONTENT_TTS_MODELS allowlist and isGeminiGenerateContentTtsModel() predicate; merges Gemini detection into the aggregate isTtsModelId() to unify TTS model identification across standard, chat-audio, and Gemini generateContent models.
TTS Routing Decision
src/main/presenter/llmProviderPresenter/providers/aiSdkProvider.ts
Updates TTS routing to use the unified isTtsModelId() predicate instead of separate model-type checks, consolidating provider-level TTS decision logic.
Gemini TTS Runtime Foundation
src/main/presenter/llmProviderPresenter/aiSdk/runtime.ts (imports, constants)
Adds imports for Gemini TTS model detection and provider context; defines default Gemini TTS voice and PCM audio parameters to support Gemini-specific execution.
Gemini TTS Helper Functions
src/main/presenter/llmProviderPresenter/aiSdk/runtime.ts (helpers)
Implements Gemini TTS utilities: prompt construction, base URL resolution with provider-specific handling (aihubmix, /v1beta routing), and PCM-to-WAV audio normalization that wraps raw PCM into WAV format.
TTS Pattern Implementation and Routing
src/main/presenter/llmProviderPresenter/aiSdk/runtime.ts (patterns, routing)
Updates Pattern B (chat-completions TTS) to include dual messages (user + assistant); adds Pattern C for Gemini generateContent with AUDIO response modalities; routes between Pattern C, B, and A based on model detection in runAiSdkCoreStream.
TTS Runtime Coverage and Test Fixtures
test/main/presenter/llmProviderPresenter/aiSdkRuntime.test.ts
Adds Vitest cleanup; extends providerFactory mock with normalizeGeminiBaseUrl; validates chat-audio TTS with dual messages and WAV format; validates Gemini TTS with correct generateContent endpoint, headers, payload shape, and audio event emission.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • zerob13

Poem

🐰 Hops with glee through TTS flows,
Gemini's voice now softly flows,
Pattern C with AUDIO calls,
PCM wraps in WAV's walls,
Tests confirm the chorus sings! 🎵

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: enhance Gemini TTS integration and update related models and tests' accurately summarizes the main changes: adding Gemini TTS support (Pattern C with generateContent endpoint), updating TTS model detection logic across multiple files, and adding comprehensive test coverage for both chat-audio and Gemini TTS patterns.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch gemini-tts-fix

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@zhangmo8 zhangmo8 merged commit 2b8816d into dev May 18, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant