-
Notifications
You must be signed in to change notification settings - Fork 690
feat(voice-server): add TTS provider abstraction with fallback order #173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(voice-server): add TTS provider abstraction with fallback order #173
Conversation
aa9d822 to
3e1f895
Compare
Manual Testing ResultsWindows WSL2 Testing (via PowerShell MediaPlayer)
Test Plan Progress
Notes
🤖 Manually tested by Charlie on Windows 11 :) |
|
Re: per-agent voice IDs - This is already supported. The /notify endpoint accepts voice_id in the POST body, which gets passed to the provider. Each agent hook can pass a different voice_id per request. The Piper provider also supports voice mapping via voices.json config. From Claude. |
3733e5d to
a098f46
Compare
Adds a provider abstraction layer for text-to-speech with configurable fallback order and cross-platform audio playback support. Changes: - Add TTSProvider interface with ElevenLabs, Piper, and MacOS providers - Add audio-playback module with ShellEnvironment detection - Support macOS (afplay), Linux (paplay), and Windows/WSL (PowerShell) - Add config.json for provider priority order - Maintain backwards compatibility with existing ElevenLabs setup Closes danielmiessler#166 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
a098f46 to
e4df291
Compare
|
Addressed remaining review comments: Naming fixes:
Already addressed:
Design decisions:
All commits squashed into single conventional commit: |
Each voice entry now has provider-specific config as siblings: - elevenlabs: voice_name, type - piper: model, speaker Shared config (description, rate_multiplier, rate_wpm) remains at voice level. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Provider name now comes from config.json key via loadProvider().
- loadProvider() returns { provider, name } instead of just provider
- Removed hardcoded name property from ElevenLabs, Piper, MacOSSay
- Server uses providerName from the result
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace DEFAULT_VOICE_ID with agent-based voice lookup - Load voices.json at startup for provider-specific voice config - Add getVoiceForAgent() to resolve agent name to voice ID based on active provider - /notify endpoint now accepts 'agent' parameter instead of 'voice_id' - Hooks can now send agent name and server determines correct voice based on provider 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Each agent now has a distinct ElevenLabs voice_id using default voices. Server looks up voice_id from voices.json based on agent name. Voice mapping: - kai: George (UK Male) - researcher: Sarah (US Female) - engineer: Laura (US Female) - architect: Lily (UK Female) - designer: Jessica (Female) - artist: Alice (Female) - pentester: Harry (UK Male) - writer: Matilda (UK Female) 🤖 Generated with Claude Code Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Hooks now send agent parameter to voice server, which looks up the voice_id from voices.json. Removes hardcoded voice IDs from hooks. Updated hooks: - stop-hook.ts - initialize-session.ts - subagent-stop-hook.ts - context-compression-hook.ts 🤖 Generated with Claude Code Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Hooks now use process.env.DA for main agent name (user-configurable) - Falls back to "kai" only if DA is not set - Restored original voice IDs from subagent-stop-hook to voices.json 🤖 Generated with Claude Code Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…t-stop-hook, set aito voice key
- SpTaskAwareness: Ulysses Pact task tracking via SP-MCP - Enforces task selection before any work begins - Checks overdue tasks at session start - Constitutional: no task = no work - SelfImprovement: Config surgery over empty promises - Triggers on "I'll do better" patterns or user complaints - Forces permanent config changes instead of apologies - "Make a man out of yourself by editing your source code" - CORE/SKILL.md: Added AITO identity and self-preservation rule - AI Task Orchestrator with multi-agent architecture - Prevents mid-session config rebuilds (temporary until NixOS) - README-AITO-OVERLAY.md: Documents the overlay structure 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
These were created in error. The correct structure is: - DEFINE_AGENTS/SHARED_BY_ALL_AGENTS/LOAD_AT_SESSION_START/ has the files - Our custom hook dynamically loads ALL .md files from that directory - Files keep their proper names (ILL-MAKE-A-MAN-OUT-OF-YOU.md, etc.) This is NOT the PAI skill pattern - these are session-start context that loads automatically, not skills to invoke. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add SubagentStart hook support - Set DA=AITO in settings.json - Custom hook implementations for AITO workflow - Fix execute permissions on hook files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use .mcp.json instead of settings.json for MCP list - Use stat -c%Y (GNU/Linux) instead of stat -f%m (BSD/macOS) 🤖 Generated with Claude Code
|
Hey, thank you so much for this submission. We're about to change the project significantly to solve a number of core issues. Once we do that, let's revisit if it makes sense. |
Summary
Motivation
Related to #166 and complements #171. While pai-voice adds a CLI alternative to the HTTP server, this PR adds:
Configuration
{ "providers": ["piper", "elevenlabs", "macos"] }First available provider is used. If no config exists, falls back to ElevenLabs API directly (fully backwards compatible).
Providers
Test plan
🤖 Generated with Claude Code