Skip to content

Commit 71bdf30

Browse files
committed
Updating toolbox for agent-browser cli
1 parent 8281cbd commit 71bdf30

File tree

2 files changed

+49
-69
lines changed

2 files changed

+49
-69
lines changed

website/developer-tools/cli-tools.md

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ sidebar_position: 3
77

88
[Modern terminals](/developer-tools/terminals) combined with CLI tools achieve feature parity with traditional IDEs—ripgrep + fzf for global search, yazi for file exploration, tmux/Zellij for pane management, lazygit for git operations. For multi-agent development, this stack becomes critical infrastructure: session persistence across disconnects, rapid context switching between worktrees, and efficient file operations without breaking flow.
99

10-
**Six categories:** Search & discovery (ripgrep, fd), text editing & inspection (micro, bat), file navigation (eza, yazi, fzf, zoxide), session management (tmux, Zellij), shell history (Atuin), and git operations (lazygit) address the most frequent CLI tasks in multi-agent development workflows.
10+
**Seven categories:** Search & discovery (ripgrep, fd), text editing & inspection (micro, bat), file navigation (eza, yazi, fzf, zoxide), session management (tmux, Zellij), shell history (Atuin), git operations (lazygit), and browser automation (agent-browser) address the most frequent CLI tasks in multi-agent development workflows.
1111

1212
## Search & Discovery Tools
1313

@@ -370,6 +370,44 @@ sudo pacman -S zellij # Arch
370370
# Others: check https://zellij.dev/documentation/installation
371371
```
372372

373+
## Browser Automation
374+
375+
### agent-browser
376+
377+
[**agent-browser**](https://agent-browser.dev/) is a Rust-based CLI for browser automation designed specifically for AI agents. Native binary, cross-platform support, works with any agent that runs shell commands.
378+
379+
**Key differentiators:** Ref-based accessibility tree system returns compact snapshots with deterministic element references (`@e1`, `@e2`)—agents click by ref instead of fragile CSS selectors or XPath. Token-efficient output (200-400 tokens per snapshot vs 5,000-15,000 for full DOM) preserves agent context window. 50+ commands cover navigation, forms, screenshots, network inspection, and storage. Session support enables multiple isolated browser instances with separate authentication states. Native Rust CLI provides instant command parsing without Node.js or Python runtime overhead.
380+
381+
**Best suited for:** AI-assisted workflows where agents need to interact with web UIs—testing changes in browser, filling forms, extracting data, validating deployments. Engineers using CLI-based agents (Claude Code, Cursor, Copilot) who need browser automation without MCP server setup. Developers wanting deterministic element selection over screenshot-based visual parsing or brittle selector strategies.
382+
383+
**Trade-offs:** Ref-based selection requires snapshot before interaction (two commands minimum). Relies on accessibility tree, which may miss dynamically rendered content without proper ARIA attributes—ensure target applications have semantic markup.
384+
385+
**Example workflow:**
386+
387+
```bash
388+
agent-browser open example.com
389+
agent-browser snapshot -i # Returns refs: [ref=@e1] "Example Domain", [ref=@e2] "More information..."
390+
agent-browser click @e2 # Click by ref—deterministic, no selector fragility
391+
agent-browser screenshot page.png
392+
agent-browser close
393+
```
394+
395+
**Installation:**
396+
397+
```bash
398+
# npm (recommended)
399+
npm install -g agent-browser
400+
401+
# Verify installation
402+
agent-browser --version
403+
```
404+
405+
Requirements: Node.js 18+ for npm installation. Chromium-based browser (bundled or system Chrome).
406+
407+
:::tip Why Ref-Based Automation Wins
408+
agent-browser's ref-based approach (`@e1`, `@e2`) produces deterministic element selection that outperforms selector-based alternatives. The accessibility tree snapshot captures semantic structure, not visual layout—agents understand what elements *are* rather than where they appear on screen. This leads to more reliable automation that survives UI changes.
409+
:::
410+
373411
---
374412

375413
**Related Course Content:**

website/developer-tools/mcp-servers.md

Lines changed: 10 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ sidebar_position: 4
77

88
The [Model Context Protocol (MCP)](https://modelcontextprotocol.io) extends CLI agents with specialized capabilities—code research, web grounding, browser automation. While IDE-based assistants (Cursor, Windsurf) often include these features built-in, CLI agents (Claude Code, Copilot CLI, Aider) rely on MCP servers to add functionality beyond basic file operations.
99

10-
These three MCP servers address the critical gaps in AI-assisted development workflows.
10+
These MCP servers address the critical gaps in AI-assisted development workflows.
1111

1212
## Code Research
1313

@@ -73,76 +73,18 @@ Requires Go 1.23+ and Google API credentials. See [ArguSeek on GitHub](https://g
7373

7474
## Browser Automation
7575

76-
Two major options for browser automation—both provide comprehensive tooling, differ in maturity and optimization approach.
76+
Browser automation for AI agents is handled by the **agent-browser CLI**—a purpose-built tool that delivers consistently better results than MCP-based alternatives.
7777

78-
### Playwright MCP
78+
See [agent-browser in CLI Tools](/developer-tools/cli-tools#agent-browser) for installation and usage.
7979

80-
[Playwright MCP](https://github.com/microsoft/playwright-mcp) is the official browser automation server from Microsoft, built on the Playwright testing framework. Most popular MCP server on GitHub for browser automation.
80+
**Why CLI over MCP for browser automation:**
81+
- **Better results:** Ref-based accessibility tree produces deterministic, reliable element selection
82+
- **Token efficient:** 500-2000 tokens per snapshot vs 5,000-15,000 for MCP DOM dumps
83+
- **Simpler setup:** No MCP configuration, works with any shell-capable agent
84+
- **Faster iteration:** Native Rust CLI with instant command parsing
8185

82-
**What it does:**
83-
84-
- Accessibility tree approach (not screenshots)—LLM-friendly structured data from the DOM
85-
- Full browser automation via Playwright—navigate, click, type, extract data
86-
- Automated testing and exploration—generate tests, reproduce bugs, validate UX from natural language
87-
- Self-verifying workflows—agents modify code, launch browser, interact with UI, confirm expected behavior
88-
89-
**When to use it:**
90-
91-
- Mature ecosystem preference—established Playwright foundation with broad community support
92-
- Testing-focused workflows—leverages Playwright's end-to-end testing patterns
93-
- Accessibility-first automation—semantic DOM structure over visual parsing
94-
95-
**Key advantage:** High popularity and mature testing ecosystem. Accessibility tree provides clean, structured text that LLMs interpret reliably without visual processing overhead.
96-
97-
**Installation:**
98-
99-
```bash
100-
npx @playwright/mcp@latest
101-
```
102-
103-
Requires Node.js 18+. See [Playwright MCP on GitHub](https://github.com/microsoft/playwright-mcp) for MCP client configuration.
104-
105-
### Chrome DevTools MCP
106-
107-
[Chrome DevTools MCP](https://github.com/ChromeDevTools/chrome-devtools-mcp) is the official browser automation server from the Google Chrome team, purpose-built for MCP workflows with context optimization.
108-
109-
**What it does:** (26+ professional tools)
110-
111-
- Performance analysis—run traces, extract LCP, blocking time, actionable metrics
112-
- Advanced debugging—analyze network requests (CORS, failed loads), inspect console logs, take DOM snapshots
113-
- Reliable automation—simulate user interactions (click, type, navigate) via Puppeteer
114-
- Emulation—CPU throttling, network speed, viewport size for testing under constraints
115-
116-
**When to use it:**
117-
118-
- Performance-focused workflows—deep Chrome DevTools integration for profiling and optimization
119-
- Context-optimized preference—newer tool designed specifically for MCP agent use cases
120-
- Chrome-specific features—leverage proprietary DevTools Protocol capabilities
121-
122-
**Key capability:** Closes the "write code → run → verify" loop—agents test their changes in the browser and iterate based on actual behavior.
123-
124-
**Installation:**
125-
126-
```bash
127-
npx chrome-devtools-mcp@latest
128-
```
129-
130-
See [Chrome DevTools MCP on GitHub](https://github.com/ChromeDevTools/chrome-devtools-mcp) for MCP client configuration.
131-
132-
### Choosing Between Them
133-
134-
**Playwright MCP:** More popular with broader GitHub community, mature testing ecosystem, established Playwright foundation. Best for standard testing workflows and accessibility-first automation.
135-
136-
**Chrome DevTools MCP:** Newer and purpose-built for MCP, context-optimized by the Chrome team, performance analysis focus. Best for Chrome-specific debugging and profiling workflows.
137-
138-
Both provide comprehensive browser automation with similar scope (~26 tools). The choice depends on ecosystem preference and whether you prioritize maturity (Playwright) or MCP-specific optimization (CDP).
139-
140-
:::tip Run Browser Automation in Sub-Agents
141-
Browser automation generates high token volumes—DOM snapshots (5,000-15,000 tokens), screenshots (3,000-8,000 tokens), network traces (2,000-10,000 tokens). Multiple operations quickly fill your context window.
142-
143-
**Best practice:** Delegate browser tasks to sub-agents. The sub-agent processes DOM data and screenshots in its isolated context, then returns a concise synthesis: "Button at selector `.submit-btn` clicked, form submitted successfully, redirected to `/dashboard`" (50 tokens instead of 15,000-token DOM dump).
144-
145-
See [Lesson 5: Sub-Agents for Context Isolation](/docs/methodology/lesson-5-grounding#solution-2-sub-agents-for-context-isolation) for architecture details.
86+
:::note Deprecated: MCP Browser Servers
87+
Previous recommendations included Playwright MCP and Chrome DevTools MCP. These are now deprecated for agentic workflows—agent-browser's ref-based approach delivers more reliable automation with lower token overhead. The MCP servers remain available for legacy integrations but are not recommended for new projects.
14688
:::
14789

14890
---

0 commit comments

Comments
 (0)