Skip to content

Commit c48b13a

Browse files
committed
feat(scripts): add MLX-LM server script for fast Apple Silicon inference
- Add run-mlx-lm.sh script that starts an OpenAI-compatible MLX LLM server - Default model: mlx-community/Qwen3-4B-4bit (configurable via MLX_MODEL env var) - Runs on port 8080 (configurable via MLX_PORT env var) - Only works on Apple Silicon (M1/M2/M3/M4) - Add MLX-LLM pane to start-all-services.sh Zellij layout - Update README with MLX-LM documentation
1 parent b980e68 commit c48b13a

File tree

3 files changed

+64
-5
lines changed

3 files changed

+64
-5
lines changed

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -298,6 +298,7 @@ Our installation scripts automatically handle all dependencies:
298298
| Service | Purpose | Auto-installed? |
299299
|---------|---------|-----------------|
300300
| **[Ollama](https://ollama.ai/)** | Local LLM for text processing | ✅ Yes, with default model |
301+
| **[MLX-LM](https://github.com/ml-explore/mlx-lm)** | Fast LLM on Apple Silicon | ⚙️ Optional, via `uvx` |
301302
| **[Wyoming Faster Whisper](https://github.com/rhasspy/wyoming-faster-whisper)** | Speech-to-text | ✅ Yes, via `uvx` |
302303
| **[Wyoming Piper](https://github.com/rhasspy/wyoming-piper)** | Text-to-speech | ✅ Yes, via `uvx` |
303304
| **[Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI)** | Premium TTS (optional) | ⚙️ Can be added later |
@@ -318,10 +319,13 @@ You can also use other OpenAI-compatible local servers:
318319

319320
| Server | Purpose | Setup Required |
320321
|---------|---------|----------------|
322+
| **[MLX-LM](https://github.com/ml-explore/mlx-lm)** | Fast LLM inference on Apple Silicon | `./scripts/run-mlx-lm.sh` or use `--openai-base-url http://localhost:10500/v1` |
321323
| **llama.cpp** | Local LLM inference | Use `--openai-base-url http://localhost:8080/v1` |
322324
| **vLLM** | High-performance LLM serving | Use `--openai-base-url` with server endpoint |
323325
| **Ollama** | Default local LLM | Already configured as default |
324326

327+
> **Apple Silicon Users**: MLX-LM provides significantly faster inference than Ollama on M1/M2/M3/M4 Macs. Start it with `./scripts/run-mlx-lm.sh` and use `--llm-provider openai --openai-base-url http://localhost:10500/v1` to connect.
328+
325329
## Usage
326330

327331
This package provides multiple command-line tools, each designed for a specific purpose.

scripts/run-mlx-lm.sh

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
#!/usr/bin/env bash
2+
echo "🧠 Starting MLX LLM Server on port 8080..."
3+
4+
# Check if running on macOS with Apple Silicon
5+
if [[ "$(uname)" != "Darwin" ]]; then
6+
echo "❌ MLX only works on macOS with Apple Silicon."
7+
exit 1
8+
fi
9+
10+
if [[ "$(uname -m)" != "arm64" ]]; then
11+
echo "❌ MLX requires Apple Silicon (M1/M2/M3/M4). Intel Macs are not supported."
12+
exit 1
13+
fi
14+
15+
# Default model - can be overridden with MLX_MODEL environment variable
16+
# Popular options:
17+
# - mlx-community/Qwen3-4B-4bit (fast, high quality, default)
18+
# - mlx-community/Qwen3-8B-4bit (larger, even better quality)
19+
# - mlx-community/gpt-oss-20b-MXFP4-Q8 (20B parameter, high quality)
20+
MODEL="${MLX_MODEL:-mlx-community/Qwen3-4B-4bit}"
21+
PORT="${MLX_PORT:-10500}"
22+
23+
echo "📦 Model: $MODEL"
24+
echo "🔌 Port: $PORT"
25+
echo ""
26+
echo "Usage with agent-cli:"
27+
echo " agent-cli transcribe --llm --llm-provider openai --openai-base-url http://localhost:$PORT/v1 --llm-openai-model $MODEL"
28+
echo " agent-cli autocorrect --llm-provider openai --openai-base-url http://localhost:$PORT/v1 --llm-openai-model $MODEL"
29+
echo ""
30+
echo "To make MLX the default, add to ~/.config/agent-cli/config.toml:"
31+
echo " [defaults]"
32+
echo " llm_provider = \"openai\""
33+
echo " openai_base_url = \"http://localhost:$PORT/v1\""
34+
echo " llm_openai_model = \"$MODEL\""
35+
echo ""
36+
37+
# Run mlx-lm server using uvx
38+
# --host 0.0.0.0 allows connections from other machines/tools
39+
uvx --python 3.12 \
40+
--from "mlx-lm" \
41+
mlx_lm.server \
42+
--model "$MODEL" \
43+
--host 0.0.0.0 \
44+
--port "$PORT"

scripts/start-all-services.sh

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,21 @@ fi
1010
# Get the current directory
1111
SCRIPTS_DIR="$(cd "$(dirname "$0")" && pwd)"
1212

13+
# Determine LLM pane based on platform
14+
# Use MLX-LLM on macOS ARM (Apple Silicon), Ollama otherwise
15+
if [[ "$(uname)" == "Darwin" && "$(uname -m)" == "arm64" ]]; then
16+
LLM_PANE=' pane {
17+
name "MLX-LLM"
18+
cwd "'"$SCRIPTS_DIR"'"
19+
command "./run-mlx-lm.sh"
20+
}'
21+
else
22+
LLM_PANE=' pane {
23+
name "Ollama"
24+
command "ollama"
25+
args "serve"
26+
}'
27+
fi
1328

1429
# Create .runtime directory and Zellij layout file
1530
mkdir -p "$SCRIPTS_DIR/.runtime"
@@ -19,11 +34,7 @@ session_name "agent-cli"
1934
layout {
2035
pane split_direction="vertical" {
2136
pane split_direction="horizontal" {
22-
pane {
23-
name "Ollama"
24-
command "ollama"
25-
args "serve"
26-
}
37+
$LLM_PANE
2738
pane {
2839
name "Help"
2940
command "sh"

0 commit comments

Comments
 (0)