Releases: basnijholt/agent-cli
v0.45.0
🎉 Gemini TTS Support
This release adds Google Gemini as a TTS provider, completing full Gemini provider parity across all three AI services (ASR, LLM, TTS).
✨ New Features
- Gemini TTS provider - Use Gemini's native text-to-speech across all voice commands:
agent-cli speak "Hello world" --tts-provider gemini agent-cli chat --tts-provider gemini agent-cli assistant --tts-provider gemini agent-cli voice-edit --tts-provider gemini - New CLI options:
--tts-gemini-modeland--tts-gemini-voice - Default model:
gemini-2.5-flash-preview-tts - Available voices:
Kore(default),Puck,Charon,Fenrir, and more
🔧 Improvements
- Consolidated PCM-to-WAV conversion into a single reusable function
- Documentation fixes for markdown formatting and emoji rendering
📚 Full Changelog
Full Changelog: v0.44.0...v0.45.0
v0.44.0
What's New
✨ Features
- Gemini ASR Support: Added Google Gemini as an ASR provider using native audio understanding capabilities (#177)
- Documentation Site: New Zensical-powered documentation site with auto-generated option tables (#157, #159, #162)
📚 Documentation
- Simplified Windows installation with native Windows support (#170)
- Added iOS Shortcut Guide to documentation (#165)
- Various improvements: GitHub-flavored admonitions, syntax highlighting, updated logo (#166, #169, #173-176)
🔧 Improvements
- Fixed VAD tests on Windows to prevent torch hang (#156)
- Cleaned up scripts directory and inlined zellij help text (#171, #172)
- DRY refactoring in docs generation (#164)
Full Changelog: v0.43.0...v0.44.0
v0.43.0
What's New
macOS Service Improvements
-
Ollama now runs as a brew service (#155): Instead of manually managing
ollama serve, Ollama now runs as a proper background service viabrew services. The Zellij dashboard shows service status and control commands when running as a brew service. -
Whisper runs as a launchd service on Apple Silicon (#154): On ARM Macs,
wyoming-mlx-whispernow runs as a native launchd service instead of a manual process, improving reliability and startup behavior.
Bug Fixes
- Fixed Windows CI test hang (#152): Resolved an issue where sounddevice's
Pa_Initializecould hang during tests on Windows CI.
v0.42.0
🎙️ New Feature: Continuous Transcription Daemon
This release adds transcribe-daemon, a background service that continuously captures audio and automatically transcribes speech segments using voice activity detection (VAD).
Features
- Voice Activity Detection: Uses Silero VAD (neural network-based) for real-time speech/silence detection
- Pre-speech Buffer: Captures 300ms of audio before speech is detected to avoid missing word beginnings
- Automatic Segmentation: Segments audio based on configurable silence threshold (default 1s)
- Optional LLM Processing: Cleanup and formatting of transcriptions using existing prompts
- Audio Storage: Saves segments as MP3 files organized by date (
~/.config/agent-cli/audio/YYYY/MM/DD/) - JSON Lines Logging: Logs transcriptions with timestamps, role, raw/processed text, audio file paths
- Systemd Integration: Easy service installation for always-on transcription
Installation
# Install with VAD dependency
pip install 'agent-cli[vad]'
# or
uv tool install 'agent-cli[vad]'Usage
# Basic daemon
agent-cli transcribe-daemon
# With custom role and silence threshold
agent-cli transcribe-daemon --role meeting --silence-threshold 1.5
# With LLM cleanup
agent-cli transcribe-daemon --llm --role notesFull Changelog: v0.41.0...v0.42.0
v0.41.0
What’s Changed
- feat(memory): use JSON mode for fact reconciliation (#146) @basnijholt
- feat(memory): add list_all method to MemoryClient (#147) @basnijholt
- feat(chroma): add batched upsert to prevent embedding service overload (#145) @basnijholt
v0.40.0
What’s Changed
- perf(rag): use mtime-first checking for faster startup (#144) @basnijholt
- Update uv.lock (#143) @basnijholt
v0.39.1
What’s Changed
- docs: run proxy in foreground for visible logs (#142) @basnijholt
- docs: disable Open WebUI auth for instant demo access (#141) @basnijholt
- docs: add 'Try It Now' quick start examples (#136) @basnijholt
- fix: set default log level to INFO for proxies (#140) @basnijholt
- feat: add version and -v/--version CLI flag (#138) @basnijholt
v0.39.0
What’s Changed
- refactor: move OnnxCrossEncoder to shared core module (#137) @basnijholt
- docs: add high-level overviews to architecture docs (#135) @basnijholt
- [pre-commit.ci] pre-commit autoupdate (#133) @pre-commit-ci[bot]
- feat(rag): use XML tags for context formatting (#131) @basnijholt
- feat(rag): add confidence threshold and fallback prompt (#130) @basnijholt
v0.38.0
What's Changed
- feat(whisper): use MLX-based Whisper on macOS for Apple Silicon by @basnijholt in #123
- feat: add stop file mechanism for graceful shutdown on Windows by @basnijholt in #127
- refactor: simplify Windows process management with psutil by @basnijholt in #129
Full Changelog: v0.37.1...v0.38.0
v0.37.1
What’s Changed
- fix(docs): correct AHK v2 script and use SIGINT (#125) @basnijholt
- fix(docs): correct AutoHotkey v2 syntax in Windows installation guide (#124) @basnijholt