You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: website/developer-tools/cli-tools.md
+39-1Lines changed: 39 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ sidebar_position: 3
7
7
8
8
[Modern terminals](/developer-tools/terminals) combined with CLI tools achieve feature parity with traditional IDEs—ripgrep + fzf for global search, yazi for file exploration, tmux/Zellij for pane management, lazygit for git operations. For multi-agent development, this stack becomes critical infrastructure: session persistence across disconnects, rapid context switching between worktrees, and efficient file operations without breaking flow.
9
9
10
-
**Six categories:** Search & discovery (ripgrep, fd), text editing & inspection (micro, bat), file navigation (eza, yazi, fzf, zoxide), session management (tmux, Zellij), shell history (Atuin), and git operations (lazygit) address the most frequent CLI tasks in multi-agent development workflows.
10
+
**Seven categories:** Search & discovery (ripgrep, fd), text editing & inspection (micro, bat), file navigation (eza, yazi, fzf, zoxide), session management (tmux, Zellij), shell history (Atuin), git operations (lazygit), and browser automation (agent-browser) address the most frequent CLI tasks in multi-agent development workflows.
[**agent-browser**](https://agent-browser.dev/) is a Rust-based CLI for browser automation designed specifically for AI agents. Native binary, cross-platform support, works with any agent that runs shell commands.
378
+
379
+
**Key differentiators:** Ref-based accessibility tree system returns compact snapshots with deterministic element references (`@e1`, `@e2`)—agents click by ref instead of fragile CSS selectors or XPath. Token-efficient output (200-400 tokens per snapshot vs 5,000-15,000 for full DOM) preserves agent context window. 50+ commands cover navigation, forms, screenshots, network inspection, and storage. Session support enables multiple isolated browser instances with separate authentication states. Native Rust CLI provides instant command parsing without Node.js or Python runtime overhead.
380
+
381
+
**Best suited for:** AI-assisted workflows where agents need to interact with web UIs—testing changes in browser, filling forms, extracting data, validating deployments. Engineers using CLI-based agents (Claude Code, Cursor, Copilot) who need browser automation without MCP server setup. Developers wanting deterministic element selection over screenshot-based visual parsing or brittle selector strategies.
382
+
383
+
**Trade-offs:** Ref-based selection requires snapshot before interaction (two commands minimum). Relies on accessibility tree, which may miss dynamically rendered content without proper ARIA attributes—ensure target applications have semantic markup.
agent-browser click @e2 # Click by ref—deterministic, no selector fragility
391
+
agent-browser screenshot page.png
392
+
agent-browser close
393
+
```
394
+
395
+
**Installation:**
396
+
397
+
```bash
398
+
# npm (recommended)
399
+
npm install -g agent-browser
400
+
401
+
# Verify installation
402
+
agent-browser --version
403
+
```
404
+
405
+
Requirements: Node.js 18+ for npm installation. Chromium-based browser (bundled or system Chrome).
406
+
407
+
:::tip Why Ref-Based Automation Wins
408
+
agent-browser's ref-based approach (`@e1`, `@e2`) produces deterministic element selection that outperforms selector-based alternatives. The accessibility tree snapshot captures semantic structure, not visual layout—agents understand what elements *are* rather than where they appear on screen. This leads to more reliable automation that survives UI changes.
Copy file name to clipboardExpand all lines: website/developer-tools/mcp-servers.md
+10-68Lines changed: 10 additions & 68 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ sidebar_position: 4
7
7
8
8
The [Model Context Protocol (MCP)](https://modelcontextprotocol.io) extends CLI agents with specialized capabilities—code research, web grounding, browser automation. While IDE-based assistants (Cursor, Windsurf) often include these features built-in, CLI agents (Claude Code, Copilot CLI, Aider) rely on MCP servers to add functionality beyond basic file operations.
9
9
10
-
These three MCP servers address the critical gaps in AI-assisted development workflows.
10
+
These MCP servers address the critical gaps in AI-assisted development workflows.
11
11
12
12
## Code Research
13
13
@@ -73,76 +73,18 @@ Requires Go 1.23+ and Google API credentials. See [ArguSeek on GitHub](https://g
73
73
74
74
## Browser Automation
75
75
76
-
Two major options for browser automation—both provide comprehensive tooling, differ in maturity and optimization approach.
76
+
Browser automation for AI agents is handled by the **agent-browser CLI**—a purpose-built tool that delivers consistently better results than MCP-based alternatives.
77
77
78
-
### Playwright MCP
78
+
See [agent-browser in CLI Tools](/developer-tools/cli-tools#agent-browser) for installation and usage.
79
79
80
-
[Playwright MCP](https://github.com/microsoft/playwright-mcp) is the official browser automation server from Microsoft, built on the Playwright testing framework. Most popular MCP server on GitHub for browser automation.
80
+
**Why CLI over MCP for browser automation:**
81
+
-**Better results:** Ref-based accessibility tree produces deterministic, reliable element selection
82
+
-**Token efficient:** 500-2000 tokens per snapshot vs 5,000-15,000 for MCP DOM dumps
83
+
-**Simpler setup:** No MCP configuration, works with any shell-capable agent
84
+
-**Faster iteration:** Native Rust CLI with instant command parsing
81
85
82
-
**What it does:**
83
-
84
-
- Accessibility tree approach (not screenshots)—LLM-friendly structured data from the DOM
85
-
- Full browser automation via Playwright—navigate, click, type, extract data
86
-
- Automated testing and exploration—generate tests, reproduce bugs, validate UX from natural language
- Accessibility-first automation—semantic DOM structure over visual parsing
94
-
95
-
**Key advantage:** High popularity and mature testing ecosystem. Accessibility tree provides clean, structured text that LLMs interpret reliably without visual processing overhead.
96
-
97
-
**Installation:**
98
-
99
-
```bash
100
-
npx @playwright/mcp@latest
101
-
```
102
-
103
-
Requires Node.js 18+. See [Playwright MCP on GitHub](https://github.com/microsoft/playwright-mcp) for MCP client configuration.
104
-
105
-
### Chrome DevTools MCP
106
-
107
-
[Chrome DevTools MCP](https://github.com/ChromeDevTools/chrome-devtools-mcp) is the official browser automation server from the Google Chrome team, purpose-built for MCP workflows with context optimization.
108
-
109
-
**What it does:** (26+ professional tools)
110
-
111
-
- Performance analysis—run traces, extract LCP, blocking time, actionable metrics
112
-
- Advanced debugging—analyze network requests (CORS, failed loads), inspect console logs, take DOM snapshots
113
-
- Reliable automation—simulate user interactions (click, type, navigate) via Puppeteer
114
-
- Emulation—CPU throttling, network speed, viewport size for testing under constraints
115
-
116
-
**When to use it:**
117
-
118
-
- Performance-focused workflows—deep Chrome DevTools integration for profiling and optimization
119
-
- Context-optimized preference—newer tool designed specifically for MCP agent use cases
**Key capability:** Closes the "write code → run → verify" loop—agents test their changes in the browser and iterate based on actual behavior.
123
-
124
-
**Installation:**
125
-
126
-
```bash
127
-
npx chrome-devtools-mcp@latest
128
-
```
129
-
130
-
See [Chrome DevTools MCP on GitHub](https://github.com/ChromeDevTools/chrome-devtools-mcp) for MCP client configuration.
131
-
132
-
### Choosing Between Them
133
-
134
-
**Playwright MCP:** More popular with broader GitHub community, mature testing ecosystem, established Playwright foundation. Best for standard testing workflows and accessibility-first automation.
135
-
136
-
**Chrome DevTools MCP:** Newer and purpose-built for MCP, context-optimized by the Chrome team, performance analysis focus. Best for Chrome-specific debugging and profiling workflows.
137
-
138
-
Both provide comprehensive browser automation with similar scope (~26 tools). The choice depends on ecosystem preference and whether you prioritize maturity (Playwright) or MCP-specific optimization (CDP).
139
-
140
-
:::tip Run Browser Automation in Sub-Agents
141
-
Browser automation generates high token volumes—DOM snapshots (5,000-15,000 tokens), screenshots (3,000-8,000 tokens), network traces (2,000-10,000 tokens). Multiple operations quickly fill your context window.
142
-
143
-
**Best practice:** Delegate browser tasks to sub-agents. The sub-agent processes DOM data and screenshots in its isolated context, then returns a concise synthesis: "Button at selector `.submit-btn` clicked, form submitted successfully, redirected to `/dashboard`" (50 tokens instead of 15,000-token DOM dump).
144
-
145
-
See [Lesson 5: Sub-Agents for Context Isolation](/docs/methodology/lesson-5-grounding#solution-2-sub-agents-for-context-isolation) for architecture details.
86
+
:::note Deprecated: MCP Browser Servers
87
+
Previous recommendations included Playwright MCP and Chrome DevTools MCP. These are now deprecated for agentic workflows—agent-browser's ref-based approach delivers more reliable automation with lower token overhead. The MCP servers remain available for legacy integrations but are not recommended for new projects.
0 commit comments