Skip to content

feat(voice-server): Add Google Cloud TTS as alternative provider#687

Closed
fayerman-source wants to merge 1 commit intodanielmiessler:mainfrom
fayerman-source:feat/google-cloud-tts
Closed

feat(voice-server): Add Google Cloud TTS as alternative provider#687
fayerman-source wants to merge 1 commit intodanielmiessler:mainfrom
fayerman-source:feat/google-cloud-tts

Conversation

@fayerman-source
Copy link
Copy Markdown
Contributor

Summary

Adds Google Cloud Text-to-Speech as a second TTS backend alongside ElevenLabs:

  • Provider selection via settings.jsondaidentity.ttsProvider ("elevenlabs" or "google-cloud")
  • Backwards compatible — defaults to ElevenLabs when ttsProvider is not set
  • No new dependencies — uses Google's REST API directly via fetch (no SDK)
  • Configurable voice via daidentity.googleCloudVoice (language, voice name, type, rate, pitch)

Why Google Cloud TTS

  • Free tier: 4M characters/month (Standard) or 1M (WaveNet) vs ElevenLabs' 10K
  • No attribution requirement on the free tier
  • Good fallback when ElevenLabs quota runs out

Configuration

Add to ~/.env:

GOOGLE_CLOUD_API_KEY=your_key_here

Add to ~/.claude/settings.json:

{
  "daidentity": {
    "ttsProvider": "google-cloud",
    "googleCloudVoice": {
      "languageCode": "en-US",
      "voiceName": "en-US-Neural2-D",
      "voiceType": "NEURAL2",
      "speakingRate": 1.0,
      "pitch": 0.0
    }
  }
}

Or keep using ElevenLabs by not setting ttsProvider (or setting it to "elevenlabs").

Files Changed

  • Releases/v3.0/.claude/VoiceServer/server.ts — Multi-provider TTS routing, Google Cloud TTS implementation

Context

This is a re-implementation of PR #285 (merged 2026-01-01, lost in v3.0 restructuring) targeting the current v3.0 architecture. The original code lived in Packs/kai-voice-system/ which no longer exists.

Closes #682

@kaimagnus
Copy link
Copy Markdown
Collaborator

Nice addition of Google Cloud TTS! This has merge conflicts with recent VoiceServer changes (commit 95d65cc). Could you rebase on main? We'll merge once the conflicts are resolved. Thanks! 🙏

Adds Google Cloud Text-to-Speech as a second TTS backend alongside
ElevenLabs. Provider is selected via settings.json daidentity.ttsProvider
("elevenlabs" or "google-cloud"). Defaults to ElevenLabs for backwards
compatibility.

Google Cloud TTS supports WaveNet, Neural2, and Standard voice types,
configurable via daidentity.googleCloudVoice in settings.json. Uses the
REST API directly (no SDK dependency) with GOOGLE_CLOUD_API_KEY from
~/.env.

Free tier comparison: Google Cloud offers 4M chars/month (Standard) vs
ElevenLabs' 10K chars/month.

Closes danielmiessler#682
@fayerman-source
Copy link
Copy Markdown
Contributor Author

Rebased! The conflict was in loadVoiceConfig's return statements — your desktopNotifications field and our new ttsProvider/googleVoice fields both needed to land in LoadedVoiceConfig, so I merged them all in. Three return sites updated. Should be clean now. Ready when you are!

rwilson131 added a commit to rwilson131/Personal_AI_Infrastructure that referenced this pull request Mar 1, 2026
…ovider

Adds Kokoro-FastAPI as a cost-free, self-hosted TTS alternative to
ElevenLabs. Kokoro exposes an OpenAI-compatible /v1/audio/speech
endpoint, making this patch reusable for any OpenAI-compatible TTS
service (e.g., OpenedAI-speech).

ElevenLabs remains the default — this is a zero-breaking-change opt-in
via settings.json daidentity.ttsProvider.

Changes:
- LoadedVoiceConfig: adds ttsProvider, ttsBaseUrl, ttsVoice fields
- loadVoiceConfig(): reads daidentity.ttsProvider + openaiCompatibleTts
  with env var (TTS_PROVIDER, TTS_BASE_URL, TTS_VOICE) fallback
- generateSpeechOpenAI(): new function for OpenAI-compatible endpoint
- sendNotification(): routes to Kokoro or ElevenLabs based on provider
- VOICE_ENABLED_GLOBAL: env var to mute all TTS without restarting
- /health: reports active provider and Kokoro URL/voice
- startup logs: prints active provider on boot

Relates to danielmiessler#687 (Google Cloud TTS) — complementary self-hosted option.
@danielmiessler
Copy link
Copy Markdown
Owner

Thank you for adding Google Cloud TTS as an alternative provider! This PR targets the v3.0 VoiceServer architecture, which was restructured in v4.0.

If you'd like to contribute alternative TTS provider support for the current architecture, please see Releases/v4.0.2 for the latest codebase and feel free to resubmit. Thanks for the contribution!

fayerman-source added a commit to fayerman-source/Personal_AI_Infrastructure that referenced this pull request Mar 2, 2026
Adds Google Cloud Text-to-Speech as alternative TTS provider and
fixes audio playback on Linux (WSL2).

- Google Cloud TTS via REST API (no SDK), configurable in settings.json
- Cross-platform audio: afplay (macOS), mpv/ffplay/aplay (Linux)
- Cross-platform notifications: osascript (macOS), notify-send (Linux)
- Backwards compatible: defaults to ElevenLabs when ttsProvider unset
- Accepts GOOGLE_CLOUD_API_KEY or GOOGLE_API_KEY from ~/.env

Re-implementation of danielmiessler#687 with additional Linux/WSL2 support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Add Google Cloud TTS as alternative voice provider

3 participants