🌙 Claude Voice Assistant

⚠️ WARNING: This tool may cause severe Black Mirror vibes, existential contemplation about AI, and an irresistible urge to dictate everything by voice. Side effects include increased productivity and spontaneous deep-focus coding sessions. 👁️

The world's most blessed voice assistant for Claude Code is here! Finally, you can code by literally talking to your computer like in those sci-fi movies, except this actually works and won't try to kill you. Probably. 🤖

🎯 What The Heck Is This?

This is a voice-controlled interface for Claude Code CLI. Instead of typing like a peasant from 2023, you can now:

🗣️ SPEAK YOUR CODE into existence
🎤 DICTATE ENTIRE FEATURES while dramatically pacing around your room
💬 HAVE PHILOSOPHICAL DEBATES with Claude about whether your code is art
🧠 Enter SCHIZO MODE for deep, uninterrupted coding sessions
👁️ Experience BLACK MIRROR VIBES as AI listens to your every word (but in a good way!)

No more keyboard! No more RSI! Just you, your voice, and Claude doing the actual work while you supervise like a tech CEO! 🎩

✨ Features (aka Why This is Absolutely Insane for Your Productivity)

🎤 Two Coding Modes

1. Conversation Mode 💬

Chat with Claude like he's your coding buddy who never gets tired of your questions.

Auto-Pause Detection: Speak naturally, AI figures out when you're done (Black Mirror tech!)
Enter-to-Send Mode: For those who prefer manual control (we respect consent!)
Voice Response Options: Claude can talk back or just text (your choice!)
Perfect for: Quick questions, code reviews, debugging sessions

2. Dictation Mode 📝 (The Schizo Special)

For when you need to go DEEP. Like, really deep.

Unlimited Dictation Length: Speak for as long as you want, no judgment
Session Persistence: Exit Claude and come back to the SAME conversation
Text Editing: Review and edit transcribed text before sending
Multi-Message Support: Add voice OR text messages to ongoing sessions
Perfect for: Complex features, long explanations, existential monologues about your codebase

🌟 Bonus Features

🧠 Whisper AI Transcription (medium model, ~1.5GB)
🔊 Text-to-Speech Responses (macOS say command with Russian Milena voice)
💾 Project Directory Memory (remembers where you were vibing last time)
🎯 VAD (Voice Activity Detection) for automatic pause detection
🌍 Multi-language Support (Whisper auto-detects your language)
⚡ No Timeouts in Dictation Mode (code for hours if needed!)

📋 Prerequisites (The Boring but Necessary Stuff)

macOS (because we use the say command)
Python 3.9+ (anything older is not recommended)
Claude Code CLI installed and authenticated
Microphone (obviously, unless you have telepathy)
~2GB free space (for Whisper model on first run)
Good vibes (essential, non-negotiable)

🚀 Installation (Let's Get This Bread)

Step 1: Clone This Beauty

git clone https://github.com/yourusername/claude-voice-assistant.git
cd claude-voice-assistant

Step 2: Install System Dependencies

# Install Homebrew if you don't have it (seriously, get it)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install portaudio for pyaudio
brew install portaudio

Step 3: Create Virtual Environment (Best Practice™)

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Step 4: Install Python Dependencies

pip install -r requirements.txt

If pyaudio gives you trouble (it probably will):

pip install --global-option='build_ext' \
    --global-option='-I/opt/homebrew/include' \
    --global-option='-L/opt/homebrew/lib' pyaudio

Dependencies include:

faster-whisper - Neural network for speech recognition
pyaudio - For capturing your beautiful voice
webrtcvad - Voice Activity Detection (knows when you're talking)
anthropic - For Claude API (if you want to extend it)

Step 5: Install Claude Code CLI

If you haven't already:

# Follow instructions at https://docs.claude.com/
# Make sure `claude --version` works

Step 6: Set Up Microphone Permissions

CRITICAL: macOS needs microphone access for Terminal/iTerm:

Go to System Preferences → Security & Privacy → Privacy
Click Microphone
Check the box next to Terminal or iTerm2
Restart your terminal if needed

🎮 Usage (Time to Vibe)

Basic Launch

cd ~/claude-voice-assistant
source venv/bin/activate
python voice_assistant.py

You'll see the main menu:

============================================================
🌙 CLAUDE VOICE ASSISTANT 🌙
============================================================
⚠️  Black Mirror x Vibe Coding x Schizo Mode 👁️
============================================================
Choose your vibe coding mode:
  [1] 💬 Conversation Mode - chat with AI (like Black Mirror)
  [2] 📝 Dictation Mode - long schizo task dictation
  [3] ⚙️  Configure project (change directory)
  [q] ❌ Exit vibe session
============================================================
💡 Tip: speak naturally! 🚀
============================================================

Mode 1: Conversation Mode 💬

Perfect for quick questions and rapid-fire coding sessions.

What to do:

Choose option 1
Select input method:
- [1] Auto-pause detection (speak, wait, it detects your pause)
- [2] Enter-to-send (speak, press Enter when done)
Choose if you want voice responses (y or n)
Start speaking your questions!
Press Ctrl+C to exit

Example Session:

You: "Claude, what's wrong with this Python code?"
Claude: *analyzes your code and explains the issue*
You: "Thanks bro! How do I fix it?"
Claude: *provides solution*

Mode 2: Dictation Mode 📝 (The Schizo Experience)

For deep work and complex tasks. This is where the magic happens. ✨

What to do:

Choose option 2
Start recording and speak your task (can be VERY long)
Press Enter when done
Review the transcribed text:
- [y] Send as is
- [a] Add text (logs, commands, code snippets)
- [e] Edit entire text
- [n] Cancel
Press Enter to launch Claude interactive session
Work with Claude as long as needed
Type /exit when done
MAGIC MOMENT: You'll return to a menu where you can:
- [d] Add another voice dictation to THE SAME session
- [t] Add a text message to the session
- [c] Continue working with Claude (no new message)
- [m] Return to main menu

Example Session:

You: *dictates for 3 minutes about a complex feature*
System: *shows transcribed text*
You: [a] to add logs
You: *pastes error logs*
You: [y] to send
Claude: *launches interactive mode and starts working*
*Claude fixes your code, runs tests, commits*
You: /exit
System: "What would you like to do?"
You: [d] to add more context
You: *records another message*
Claude: *continues in the SAME conversation*

Mode 3: Project Configuration ⚙️

Set your project directory. It'll be saved and remembered for next time!

Choose option: 3
Enter project directory path: ~/projects/my-awesome-app
✅ Directory changed to: /Users/you/projects/my-awesome-app
💾 Settings saved! This directory will be used on next launch.

🎪 Pro Tips (Become a Voice Coding Wizard)

1. Speak Clearly, But Naturally

Don't yell, Whisper AI is good but not deaf
Use natural pauses between thoughts
The model handles accents pretty well!

2. Dictation Mode is Your Friend for Big Tasks

Use it for feature requests, bug reports, architectural discussions
Session persistence means you can iterate without losing context
Perfect for pair programming with Claude

3. Conversation Mode for Quick Stuff

Fast questions, quick fixes, sanity checks
Enter-to-send mode is faster if you know what you want to say

4. Add Text for Precision

After voice transcription, add logs, error messages, code snippets as text
Best of both worlds: quick voice description + precise text data

5. Use Project Directory Feature

Set it once, forget about it
Claude will work in the right context every time

6. Voice Responses: Yay or Nay?

Voice responses are cool for learning/reviewing
Text-only is faster for rapid iteration
Your choice, we're not judging!

7. Quiet Environment = Better Results

Use headphones to prevent echo
Find a quiet spot for best transcription
Background noise confuses Whisper (it's sensitive, ok?)

🐛 Troubleshooting (When Things Go Sideways)

"❌ No audio recorded!"

Solutions:

Check microphone permissions (System Preferences → Security → Privacy → Microphone)
Make sure you're speaking for at least 2-3 seconds
Check if Terminal/iTerm has microphone access
Try unplugging/replugging external mics
Restart Terminal and try again

"❌ Claude Code CLI not found!"

Solution:

# Install Claude Code CLI
# Visit: https://docs.claude.com/
# Make sure it's in your PATH
claude --version  # Should work

"The transcription is terrible!"

Solutions:

Speak more clearly (sorry, AI isn't perfect yet)
Reduce background noise
Use a better microphone
Try the larger Whisper model (edit code to use "large" instead of "medium")
Check if your language is supported by Whisper

"It's too slow!"

Solutions:

First run downloads the model (~1.5GB), be patient
Use faster hardware (M1/M2 Macs work great)
Switch to "small" model for speed (less accuracy though)
Make sure you're not running other heavy tasks

"Whisper model download fails (403 error)"

Solution:

# Sometimes Hugging Face is moody
# Just run the app again, it usually works on retry
python voice_assistant.py

"pyaudio won't install!"

Solutions:

# Make sure portaudio is installed
brew install portaudio

# Try with explicit paths
pip install --global-option='build_ext' \
    --global-option='-I/opt/homebrew/include' \
    --global-option='-L/opt/homebrew/lib' pyaudio

# If still failing, try conda (last resort)
conda install -c conda-forge pyaudio

"I want to use a different voice for responses!"

Edit the code:

# In voice_assistant.py, find text_to_speech method (line ~324)
def text_to_speech(self, text: str):
    subprocess.run(
        ["say", "-v", "Alex", text],  # Change "Milena" to any macOS voice
        check=False
    )

Available voices:

say -v "?"  # Lists all available voices

🔧 Advanced Configuration

Change Whisper Model Size

Edit voice_assistant.py line ~837:

# Default (balanced)
assistant = VoiceAssistant(model_size="medium")

# Faster but less accurate
assistant = VoiceAssistant(model_size="small")

# Slower but more accurate
assistant = VoiceAssistant(model_size="large-v3")

Adjust VAD Sensitivity

Edit line ~294:

# More aggressive = triggers on quieter speech (1-3)
self.vad = webrtcvad.Vad(aggressiveness=3)  # Default: 3

Change Pause Detection Timing

Edit line ~363:

silence_threshold = 30  # Number of 30ms chunks (default: 900ms pause)
speech_chunks_required = 10  # Minimum speech before processing

🤝 Contributing (Join the Vibe)

Pull requests are welcome! For major changes:

Fork the repo
Create a branch (git checkout -b feature/awesome-feature)
Make your changes
Add tests if you're feeling responsible
Commit with style (git commit -m "Added telepathic mode (just kidding)")
Push and create a PR

Please keep the humor spicy and the code clean! 🧼

📜 License

MIT License - Use it, abuse it, just don't blame us if you become too productive and your boss expects this level of output all the time. 😅

🎨 Tech Stack (For The Nerds)

faster-whisper - Local speech-to-text (OpenAI Whisper optimized)
webrtcvad - Voice Activity Detection for pause detection
pyaudio - Audio recording from microphone
macOS say - Text-to-speech for responses
Claude Code CLI - AI assistant interface
Python 3.9+ - Because we're not savages

🙏 Credits & Acknowledgments

Anthropic - For Claude and the amazing Claude Code CLI
OpenAI - For Whisper (the transcription model)
faster-whisper - For making Whisper actually fast
The entire open-source community - Y'all are the real MVPs
Coffee - For existing ☕
That Black Mirror episode - You know the one 👁️

🌊 Vibe Coding Glossary

For those wondering what all these terms mean:

Vibe - The energy, the flow, the zone
Schizo mode - Deep focus state (not actually schizophrenia, just intense concentration)
Black Mirror vibes - That sci-fi feeling when AI actually works
Voice coding - Literally coding by talking to your computer

📝 Roadmap (Maybe)

Hotkeys for quick mode switching
Conversation history saving
Support for more TTS voices
GUI version (if terminal gets old)
Support for other AI assistants (GPT, etc.)
Windows/Linux support (if anyone cares)
Telepathy mode (v2.0, probably)
Time travel debugging (v3.0, definitely)

🌊 Final Words

Remember: This tool is powerful, use it responsibly. Speak clearly, code boldly, and may your bugs be few and your commits many!

Happy coding! 🚀🌙

Made with 💚 and blessed vibes by developers who got tired of typing

P.S. - If this project helped you, consider:

⭐ Starring the repo (please, my ego needs it)
🐛 Reporting bugs (but nicely, I have feelings)
💡 Suggesting features (the crazier the better)
🎤 Telling your friends about voice coding (they'll think you're from the future)

P.P.S. - No, it doesn't actually read your mind. Yet. That's v2.0. 🧠✨

P.P.P.S. - Yes, the Black Mirror references are intentional. No, the assistant won't take over your life. Probably. 👁️

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
download_model.py		download_model.py
requirements.txt		requirements.txt
run.sh		run.sh
setup.sh		setup.sh
test_claude_integration.py		test_claude_integration.py
test_components.py		test_components.py
test_config.py		test_config.py
test_microphone.py		test_microphone.py
voice_assistant.py		voice_assistant.py

antonbugaets/claude-voice-assistant

Folders and files

Latest commit

History

Repository files navigation

🌙 Claude Voice Assistant

🎯 What The Heck Is This?

✨ Features (aka Why This is Absolutely Insane for Your Productivity)

🎤 Two Coding Modes

1. Conversation Mode 💬

2. Dictation Mode 📝 (The Schizo Special)

🌟 Bonus Features

📋 Prerequisites (The Boring but Necessary Stuff)

🚀 Installation (Let's Get This Bread)

Step 1: Clone This Beauty

Step 2: Install System Dependencies

Step 3: Create Virtual Environment (Best Practice™)

Step 4: Install Python Dependencies

Step 5: Install Claude Code CLI

Step 6: Set Up Microphone Permissions

🎮 Usage (Time to Vibe)

Basic Launch

Mode 1: Conversation Mode 💬

Mode 2: Dictation Mode 📝 (The Schizo Experience)

Mode 3: Project Configuration ⚙️

🎪 Pro Tips (Become a Voice Coding Wizard)

1. Speak Clearly, But Naturally

2. Dictation Mode is Your Friend for Big Tasks

3. Conversation Mode for Quick Stuff

4. Add Text for Precision

5. Use Project Directory Feature

6. Voice Responses: Yay or Nay?

7. Quiet Environment = Better Results

🐛 Troubleshooting (When Things Go Sideways)

"❌ No audio recorded!"

"❌ Claude Code CLI not found!"

"The transcription is terrible!"

"It's too slow!"

"Whisper model download fails (403 error)"

"pyaudio won't install!"

"I want to use a different voice for responses!"

🔧 Advanced Configuration

Change Whisper Model Size

Adjust VAD Sensitivity

Change Pause Detection Timing

🤝 Contributing (Join the Vibe)

📜 License

🎨 Tech Stack (For The Nerds)

🙏 Credits & Acknowledgments

🌊 Vibe Coding Glossary

📝 Roadmap (Maybe)

🌊 Final Words

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages