Skip to content

Conversation

@basnijholt
Copy link
Owner

Summary

  • Add run-mlx-lm.sh script that starts an OpenAI-compatible MLX LLM server on port 8080
  • Default model: mlx-community/Qwen3-4B-4bit (configurable via MLX_MODEL env var)
  • Only works on Apple Silicon (M1/M2/M3/M4) - provides significantly faster inference than Ollama
  • Add MLX-LLM pane to start-all-services.sh Zellij layout
  • Update README with MLX-LM documentation

Usage

# Start the server
./scripts/run-mlx-lm.sh

# Or with custom model
MLX_MODEL=mlx-community/Qwen3-8B-4bit ./scripts/run-mlx-lm.sh

# Use with agent-cli
agent-cli transcribe --llm --llm-provider openai --openai-base-url http://localhost:8080/v1
agent-cli autocorrect --llm-provider openai --openai-base-url http://localhost:8080/v1

Test plan

  • Run ./scripts/run-mlx-lm.sh on Apple Silicon Mac
  • Verify server starts and responds to requests
  • Test with agent-cli transcribe --llm --llm-provider openai --openai-base-url http://localhost:8080/v1

@basnijholt basnijholt force-pushed the mlx branch 3 times, most recently from 18779ff to 1ce6b6d Compare December 5, 2025 05:32
- Add run-mlx-lm.sh script that starts an OpenAI-compatible MLX LLM server
- Default model: mlx-community/Qwen3-4B-4bit (configurable via MLX_MODEL env var)
- Runs on port 8080 (configurable via MLX_PORT env var)
- Only works on Apple Silicon (M1/M2/M3/M4)
- Add MLX-LLM pane to start-all-services.sh Zellij layout
- Update README with MLX-LM documentation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants