feat(voice-server): Add Google Cloud TTS as alternative provider#687
feat(voice-server): Add Google Cloud TTS as alternative provider#687fayerman-source wants to merge 1 commit intodanielmiessler:mainfrom
Conversation
|
Nice addition of Google Cloud TTS! This has merge conflicts with recent VoiceServer changes (commit 95d65cc). Could you rebase on main? We'll merge once the conflicts are resolved. Thanks! 🙏 |
Adds Google Cloud Text-to-Speech as a second TTS backend alongside
ElevenLabs. Provider is selected via settings.json daidentity.ttsProvider
("elevenlabs" or "google-cloud"). Defaults to ElevenLabs for backwards
compatibility.
Google Cloud TTS supports WaveNet, Neural2, and Standard voice types,
configurable via daidentity.googleCloudVoice in settings.json. Uses the
REST API directly (no SDK dependency) with GOOGLE_CLOUD_API_KEY from
~/.env.
Free tier comparison: Google Cloud offers 4M chars/month (Standard) vs
ElevenLabs' 10K chars/month.
Closes danielmiessler#682
8d83e09 to
e4ac85b
Compare
|
Rebased! The conflict was in |
…ovider Adds Kokoro-FastAPI as a cost-free, self-hosted TTS alternative to ElevenLabs. Kokoro exposes an OpenAI-compatible /v1/audio/speech endpoint, making this patch reusable for any OpenAI-compatible TTS service (e.g., OpenedAI-speech). ElevenLabs remains the default — this is a zero-breaking-change opt-in via settings.json daidentity.ttsProvider. Changes: - LoadedVoiceConfig: adds ttsProvider, ttsBaseUrl, ttsVoice fields - loadVoiceConfig(): reads daidentity.ttsProvider + openaiCompatibleTts with env var (TTS_PROVIDER, TTS_BASE_URL, TTS_VOICE) fallback - generateSpeechOpenAI(): new function for OpenAI-compatible endpoint - sendNotification(): routes to Kokoro or ElevenLabs based on provider - VOICE_ENABLED_GLOBAL: env var to mute all TTS without restarting - /health: reports active provider and Kokoro URL/voice - startup logs: prints active provider on boot Relates to danielmiessler#687 (Google Cloud TTS) — complementary self-hosted option.
|
Thank you for adding Google Cloud TTS as an alternative provider! This PR targets the v3.0 VoiceServer architecture, which was restructured in v4.0. If you'd like to contribute alternative TTS provider support for the current architecture, please see Releases/v4.0.2 for the latest codebase and feel free to resubmit. Thanks for the contribution! |
Adds Google Cloud Text-to-Speech as alternative TTS provider and fixes audio playback on Linux (WSL2). - Google Cloud TTS via REST API (no SDK), configurable in settings.json - Cross-platform audio: afplay (macOS), mpv/ffplay/aplay (Linux) - Cross-platform notifications: osascript (macOS), notify-send (Linux) - Backwards compatible: defaults to ElevenLabs when ttsProvider unset - Accepts GOOGLE_CLOUD_API_KEY or GOOGLE_API_KEY from ~/.env Re-implementation of danielmiessler#687 with additional Linux/WSL2 support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Adds Google Cloud Text-to-Speech as a second TTS backend alongside ElevenLabs:
settings.json→daidentity.ttsProvider("elevenlabs"or"google-cloud")ttsProvideris not setfetch(no SDK)daidentity.googleCloudVoice(language, voice name, type, rate, pitch)Why Google Cloud TTS
Configuration
Add to
~/.env:Add to
~/.claude/settings.json:{ "daidentity": { "ttsProvider": "google-cloud", "googleCloudVoice": { "languageCode": "en-US", "voiceName": "en-US-Neural2-D", "voiceType": "NEURAL2", "speakingRate": 1.0, "pitch": 0.0 } } }Or keep using ElevenLabs by not setting
ttsProvider(or setting it to"elevenlabs").Files Changed
Releases/v3.0/.claude/VoiceServer/server.ts— Multi-provider TTS routing, Google Cloud TTS implementationContext
This is a re-implementation of PR #285 (merged 2026-01-01, lost in v3.0 restructuring) targeting the current v3.0 architecture. The original code lived in
Packs/kai-voice-system/which no longer exists.Closes #682