Skip to content

🌙 Voice-controlled interface for Claude Code CLI. Code by talking like in Black Mirror. Features: Whisper AI, VAD, session persistence, schizo mode for deep focus.

Notifications You must be signed in to change notification settings

antonbugaets/claude-voice-assistant

Repository files navigation

🌙 Claude Voice Assistant

Black Mirror Vibe Schizo Mode

⚠️ WARNING: This tool may cause severe Black Mirror vibes, existential contemplation about AI, and an irresistible urge to dictate everything by voice. Side effects include increased productivity and spontaneous deep-focus coding sessions. 👁️

The world's most blessed voice assistant for Claude Code is here! Finally, you can code by literally talking to your computer like in those sci-fi movies, except this actually works and won't try to kill you. Probably. 🤖

🎯 What The Heck Is This?

This is a voice-controlled interface for Claude Code CLI. Instead of typing like a peasant from 2023, you can now:

  • 🗣️ SPEAK YOUR CODE into existence
  • 🎤 DICTATE ENTIRE FEATURES while dramatically pacing around your room
  • 💬 HAVE PHILOSOPHICAL DEBATES with Claude about whether your code is art
  • 🧠 Enter SCHIZO MODE for deep, uninterrupted coding sessions
  • 👁️ Experience BLACK MIRROR VIBES as AI listens to your every word (but in a good way!)

No more keyboard! No more RSI! Just you, your voice, and Claude doing the actual work while you supervise like a tech CEO! 🎩

✨ Features (aka Why This is Absolutely Insane for Your Productivity)

🎤 Two Coding Modes

1. Conversation Mode 💬

Chat with Claude like he's your coding buddy who never gets tired of your questions.

  • Auto-Pause Detection: Speak naturally, AI figures out when you're done (Black Mirror tech!)
  • Enter-to-Send Mode: For those who prefer manual control (we respect consent!)
  • Voice Response Options: Claude can talk back or just text (your choice!)
  • Perfect for: Quick questions, code reviews, debugging sessions

2. Dictation Mode 📝 (The Schizo Special)

For when you need to go DEEP. Like, really deep.

  • Unlimited Dictation Length: Speak for as long as you want, no judgment
  • Session Persistence: Exit Claude and come back to the SAME conversation
  • Text Editing: Review and edit transcribed text before sending
  • Multi-Message Support: Add voice OR text messages to ongoing sessions
  • Perfect for: Complex features, long explanations, existential monologues about your codebase

🌟 Bonus Features

  • 🧠 Whisper AI Transcription (medium model, ~1.5GB)
  • 🔊 Text-to-Speech Responses (macOS say command with Russian Milena voice)
  • 💾 Project Directory Memory (remembers where you were vibing last time)
  • 🎯 VAD (Voice Activity Detection) for automatic pause detection
  • 🌍 Multi-language Support (Whisper auto-detects your language)
  • No Timeouts in Dictation Mode (code for hours if needed!)

📋 Prerequisites (The Boring but Necessary Stuff)

  1. macOS (because we use the say command)
  2. Python 3.9+ (anything older is not recommended)
  3. Claude Code CLI installed and authenticated
  4. Microphone (obviously, unless you have telepathy)
  5. ~2GB free space (for Whisper model on first run)
  6. Good vibes (essential, non-negotiable)

🚀 Installation (Let's Get This Bread)

Step 1: Clone This Beauty

git clone https://github.com/yourusername/claude-voice-assistant.git
cd claude-voice-assistant

Step 2: Install System Dependencies

# Install Homebrew if you don't have it (seriously, get it)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install portaudio for pyaudio
brew install portaudio

Step 3: Create Virtual Environment (Best Practice™)

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Step 4: Install Python Dependencies

pip install -r requirements.txt

If pyaudio gives you trouble (it probably will):

pip install --global-option='build_ext' \
    --global-option='-I/opt/homebrew/include' \
    --global-option='-L/opt/homebrew/lib' pyaudio

Dependencies include:

  • faster-whisper - Neural network for speech recognition
  • pyaudio - For capturing your beautiful voice
  • webrtcvad - Voice Activity Detection (knows when you're talking)
  • anthropic - For Claude API (if you want to extend it)

Step 5: Install Claude Code CLI

If you haven't already:

# Follow instructions at https://docs.claude.com/
# Make sure `claude --version` works

Step 6: Set Up Microphone Permissions

CRITICAL: macOS needs microphone access for Terminal/iTerm:

  1. Go to System PreferencesSecurity & PrivacyPrivacy
  2. Click Microphone
  3. Check the box next to Terminal or iTerm2
  4. Restart your terminal if needed

🎮 Usage (Time to Vibe)

Basic Launch

cd ~/claude-voice-assistant
source venv/bin/activate
python voice_assistant.py

You'll see the main menu:

============================================================
🌙 CLAUDE VOICE ASSISTANT 🌙
============================================================
⚠️  Black Mirror x Vibe Coding x Schizo Mode 👁️
============================================================
Choose your vibe coding mode:
  [1] 💬 Conversation Mode - chat with AI (like Black Mirror)
  [2] 📝 Dictation Mode - long schizo task dictation
  [3] ⚙️  Configure project (change directory)
  [q] ❌ Exit vibe session
============================================================
💡 Tip: speak naturally! 🚀
============================================================

Mode 1: Conversation Mode 💬

Perfect for quick questions and rapid-fire coding sessions.

What to do:

  1. Choose option 1
  2. Select input method:
    • [1] Auto-pause detection (speak, wait, it detects your pause)
    • [2] Enter-to-send (speak, press Enter when done)
  3. Choose if you want voice responses (y or n)
  4. Start speaking your questions!
  5. Press Ctrl+C to exit

Example Session:

You: "Claude, what's wrong with this Python code?"
Claude: *analyzes your code and explains the issue*
You: "Thanks bro! How do I fix it?"
Claude: *provides solution*

Mode 2: Dictation Mode 📝 (The Schizo Experience)

For deep work and complex tasks. This is where the magic happens. ✨

What to do:

  1. Choose option 2
  2. Start recording and speak your task (can be VERY long)
  3. Press Enter when done
  4. Review the transcribed text:
    • [y] Send as is
    • [a] Add text (logs, commands, code snippets)
    • [e] Edit entire text
    • [n] Cancel
  5. Press Enter to launch Claude interactive session
  6. Work with Claude as long as needed
  7. Type /exit when done
  8. MAGIC MOMENT: You'll return to a menu where you can:
    • [d] Add another voice dictation to THE SAME session
    • [t] Add a text message to the session
    • [c] Continue working with Claude (no new message)
    • [m] Return to main menu

Example Session:

You: *dictates for 3 minutes about a complex feature*
System: *shows transcribed text*
You: [a] to add logs
You: *pastes error logs*
You: [y] to send
Claude: *launches interactive mode and starts working*
*Claude fixes your code, runs tests, commits*
You: /exit
System: "What would you like to do?"
You: [d] to add more context
You: *records another message*
Claude: *continues in the SAME conversation*

Mode 3: Project Configuration ⚙️

Set your project directory. It'll be saved and remembered for next time!

Choose option: 3
Enter project directory path: ~/projects/my-awesome-app
✅ Directory changed to: /Users/you/projects/my-awesome-app
💾 Settings saved! This directory will be used on next launch.

🎪 Pro Tips (Become a Voice Coding Wizard)

1. Speak Clearly, But Naturally

  • Don't yell, Whisper AI is good but not deaf
  • Use natural pauses between thoughts
  • The model handles accents pretty well!

2. Dictation Mode is Your Friend for Big Tasks

  • Use it for feature requests, bug reports, architectural discussions
  • Session persistence means you can iterate without losing context
  • Perfect for pair programming with Claude

3. Conversation Mode for Quick Stuff

  • Fast questions, quick fixes, sanity checks
  • Enter-to-send mode is faster if you know what you want to say

4. Add Text for Precision

  • After voice transcription, add logs, error messages, code snippets as text
  • Best of both worlds: quick voice description + precise text data

5. Use Project Directory Feature

  • Set it once, forget about it
  • Claude will work in the right context every time

6. Voice Responses: Yay or Nay?

  • Voice responses are cool for learning/reviewing
  • Text-only is faster for rapid iteration
  • Your choice, we're not judging!

7. Quiet Environment = Better Results

  • Use headphones to prevent echo
  • Find a quiet spot for best transcription
  • Background noise confuses Whisper (it's sensitive, ok?)

🐛 Troubleshooting (When Things Go Sideways)

"❌ No audio recorded!"

Solutions:

  1. Check microphone permissions (System Preferences → Security → Privacy → Microphone)
  2. Make sure you're speaking for at least 2-3 seconds
  3. Check if Terminal/iTerm has microphone access
  4. Try unplugging/replugging external mics
  5. Restart Terminal and try again

"❌ Claude Code CLI not found!"

Solution:

# Install Claude Code CLI
# Visit: https://docs.claude.com/
# Make sure it's in your PATH
claude --version  # Should work

"The transcription is terrible!"

Solutions:

  • Speak more clearly (sorry, AI isn't perfect yet)
  • Reduce background noise
  • Use a better microphone
  • Try the larger Whisper model (edit code to use "large" instead of "medium")
  • Check if your language is supported by Whisper

"It's too slow!"

Solutions:

  • First run downloads the model (~1.5GB), be patient
  • Use faster hardware (M1/M2 Macs work great)
  • Switch to "small" model for speed (less accuracy though)
  • Make sure you're not running other heavy tasks

"Whisper model download fails (403 error)"

Solution:

# Sometimes Hugging Face is moody
# Just run the app again, it usually works on retry
python voice_assistant.py

"pyaudio won't install!"

Solutions:

# Make sure portaudio is installed
brew install portaudio

# Try with explicit paths
pip install --global-option='build_ext' \
    --global-option='-I/opt/homebrew/include' \
    --global-option='-L/opt/homebrew/lib' pyaudio

# If still failing, try conda (last resort)
conda install -c conda-forge pyaudio

"I want to use a different voice for responses!"

Edit the code:

# In voice_assistant.py, find text_to_speech method (line ~324)
def text_to_speech(self, text: str):
    subprocess.run(
        ["say", "-v", "Alex", text],  # Change "Milena" to any macOS voice
        check=False
    )

Available voices:

say -v "?"  # Lists all available voices

🔧 Advanced Configuration

Change Whisper Model Size

Edit voice_assistant.py line ~837:

# Default (balanced)
assistant = VoiceAssistant(model_size="medium")

# Faster but less accurate
assistant = VoiceAssistant(model_size="small")

# Slower but more accurate
assistant = VoiceAssistant(model_size="large-v3")

Adjust VAD Sensitivity

Edit line ~294:

# More aggressive = triggers on quieter speech (1-3)
self.vad = webrtcvad.Vad(aggressiveness=3)  # Default: 3

Change Pause Detection Timing

Edit line ~363:

silence_threshold = 30  # Number of 30ms chunks (default: 900ms pause)
speech_chunks_required = 10  # Minimum speech before processing

🤝 Contributing (Join the Vibe)

Pull requests are welcome! For major changes:

  1. Fork the repo
  2. Create a branch (git checkout -b feature/awesome-feature)
  3. Make your changes
  4. Add tests if you're feeling responsible
  5. Commit with style (git commit -m "Added telepathic mode (just kidding)")
  6. Push and create a PR

Please keep the humor spicy and the code clean! 🧼

📜 License

MIT License - Use it, abuse it, just don't blame us if you become too productive and your boss expects this level of output all the time. 😅

🎨 Tech Stack (For The Nerds)

  • faster-whisper - Local speech-to-text (OpenAI Whisper optimized)
  • webrtcvad - Voice Activity Detection for pause detection
  • pyaudio - Audio recording from microphone
  • macOS say - Text-to-speech for responses
  • Claude Code CLI - AI assistant interface
  • Python 3.9+ - Because we're not savages

🙏 Credits & Acknowledgments

  • Anthropic - For Claude and the amazing Claude Code CLI
  • OpenAI - For Whisper (the transcription model)
  • faster-whisper - For making Whisper actually fast
  • The entire open-source community - Y'all are the real MVPs
  • Coffee - For existing ☕
  • That Black Mirror episode - You know the one 👁️

🌊 Vibe Coding Glossary

For those wondering what all these terms mean:

  • Vibe - The energy, the flow, the zone
  • Schizo mode - Deep focus state (not actually schizophrenia, just intense concentration)
  • Black Mirror vibes - That sci-fi feeling when AI actually works
  • Voice coding - Literally coding by talking to your computer

📝 Roadmap (Maybe)

  • Hotkeys for quick mode switching
  • Conversation history saving
  • Support for more TTS voices
  • GUI version (if terminal gets old)
  • Support for other AI assistants (GPT, etc.)
  • Windows/Linux support (if anyone cares)
  • Telepathy mode (v2.0, probably)
  • Time travel debugging (v3.0, definitely)

🌊 Final Words

Remember: This tool is powerful, use it responsibly. Speak clearly, code boldly, and may your bugs be few and your commits many!

Happy coding! 🚀🌙


Made with 💚 and blessed vibes by developers who got tired of typing

P.S. - If this project helped you, consider:

  • ⭐ Starring the repo (please, my ego needs it)
  • 🐛 Reporting bugs (but nicely, I have feelings)
  • 💡 Suggesting features (the crazier the better)
  • 🎤 Telling your friends about voice coding (they'll think you're from the future)

P.P.S. - No, it doesn't actually read your mind. Yet. That's v2.0. 🧠✨

P.P.P.S. - Yes, the Black Mirror references are intentional. No, the assistant won't take over your life. Probably. 👁️

About

🌙 Voice-controlled interface for Claude Code CLI. Code by talking like in Black Mirror. Features: Whisper AI, VAD, session persistence, schizo mode for deep focus.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published