Updating toolbox for agent-browser cli

ofriw · ofriw · commit 71bdf30f0bec · 2026-02-04T17:40:30.000+02:00
diff --git a/website/developer-tools/cli-tools.md b/website/developer-tools/cli-tools.md
@@ -7,7 +7,7 @@ sidebar_position: 3
 
 [Modern terminals](/developer-tools/terminals) combined with CLI tools achieve feature parity with traditional IDEs—ripgrep + fzf for global search, yazi for file exploration, tmux/Zellij for pane management, lazygit for git operations. For multi-agent development, this stack becomes critical infrastructure: session persistence across disconnects, rapid context switching between worktrees, and efficient file operations without breaking flow.
 
-**Six categories:** Search & discovery (ripgrep, fd), text editing & inspection (micro, bat), file navigation (eza, yazi, fzf, zoxide), session management (tmux, Zellij), shell history (Atuin), and git operations (lazygit) address the most frequent CLI tasks in multi-agent development workflows.
+**Seven categories:** Search & discovery (ripgrep, fd), text editing & inspection (micro, bat), file navigation (eza, yazi, fzf, zoxide), session management (tmux, Zellij), shell history (Atuin), git operations (lazygit), and browser automation (agent-browser) address the most frequent CLI tasks in multi-agent development workflows.
 
 ## Search & Discovery Tools
 
@@ -370,6 +370,44 @@ sudo pacman -S zellij       # Arch
 # Others: check https://zellij.dev/documentation/installation
 ```
 
+## Browser Automation
+
+### agent-browser
+
+[**agent-browser**](https://agent-browser.dev/) is a Rust-based CLI for browser automation designed specifically for AI agents. Native binary, cross-platform support, works with any agent that runs shell commands.
+
+**Key differentiators:** Ref-based accessibility tree system returns compact snapshots with deterministic element references (`@e1`, `@e2`)—agents click by ref instead of fragile CSS selectors or XPath. Token-efficient output (200-400 tokens per snapshot vs 5,000-15,000 for full DOM) preserves agent context window. 50+ commands cover navigation, forms, screenshots, network inspection, and storage. Session support enables multiple isolated browser instances with separate authentication states. Native Rust CLI provides instant command parsing without Node.js or Python runtime overhead.
+
+**Best suited for:** AI-assisted workflows where agents need to interact with web UIs—testing changes in browser, filling forms, extracting data, validating deployments. Engineers using CLI-based agents (Claude Code, Cursor, Copilot) who need browser automation without MCP server setup. Developers wanting deterministic element selection over screenshot-based visual parsing or brittle selector strategies.
+
+**Trade-offs:** Ref-based selection requires snapshot before interaction (two commands minimum). Relies on accessibility tree, which may miss dynamically rendered content without proper ARIA attributes—ensure target applications have semantic markup.
+
+**Example workflow:**
+
+```bash
+agent-browser open example.com
+agent-browser snapshot -i        # Returns refs: [ref=@e1] "Example Domain", [ref=@e2] "More information..."
+agent-browser click @e2          # Click by ref—deterministic, no selector fragility
+agent-browser screenshot page.png
+agent-browser close
+```
+
+**Installation:**
+
+```bash
+# npm (recommended)
+npm install -g agent-browser
+
+# Verify installation
+agent-browser --version
+```
+
+Requirements: Node.js 18+ for npm installation. Chromium-based browser (bundled or system Chrome).
+
+:::tip Why Ref-Based Automation Wins
+agent-browser's ref-based approach (`@e1`, `@e2`) produces deterministic element selection that outperforms selector-based alternatives. The accessibility tree snapshot captures semantic structure, not visual layout—agents understand what elements *are* rather than where they appear on screen. This leads to more reliable automation that survives UI changes.
+:::
+
 ---
 
 **Related Course Content:**
diff --git a/website/developer-tools/mcp-servers.md b/website/developer-tools/mcp-servers.md
@@ -7,7 +7,7 @@ sidebar_position: 4
 
 The [Model Context Protocol (MCP)](https://modelcontextprotocol.io) extends CLI agents with specialized capabilities—code research, web grounding, browser automation. While IDE-based assistants (Cursor, Windsurf) often include these features built-in, CLI agents (Claude Code, Copilot CLI, Aider) rely on MCP servers to add functionality beyond basic file operations.
 
-These three MCP servers address the critical gaps in AI-assisted development workflows.
+These MCP servers address the critical gaps in AI-assisted development workflows.
 
 ## Code Research
 
@@ -73,76 +73,18 @@ Requires Go 1.23+ and Google API credentials. See [ArguSeek on GitHub](https://g
 
 ## Browser Automation
 
-Two major options for browser automation—both provide comprehensive tooling, differ in maturity and optimization approach.
+Browser automation for AI agents is handled by the **agent-browser CLI**—a purpose-built tool that delivers consistently better results than MCP-based alternatives.
 
-### Playwright MCP
+See [agent-browser in CLI Tools](/developer-tools/cli-tools#agent-browser) for installation and usage.
 
-[Playwright MCP](https://github.com/microsoft/playwright-mcp) is the official browser automation server from Microsoft, built on the Playwright testing framework. Most popular MCP server on GitHub for browser automation.
+**Why CLI over MCP for browser automation:**
+- **Better results:** Ref-based accessibility tree produces deterministic, reliable element selection
+- **Token efficient:** 500-2000 tokens per snapshot vs 5,000-15,000 for MCP DOM dumps
+- **Simpler setup:** No MCP configuration, works with any shell-capable agent
+- **Faster iteration:** Native Rust CLI with instant command parsing
 
-**What it does:**
-
-- Accessibility tree approach (not screenshots)—LLM-friendly structured data from the DOM
-- Full browser automation via Playwright—navigate, click, type, extract data
-- Automated testing and exploration—generate tests, reproduce bugs, validate UX from natural language
-- Self-verifying workflows—agents modify code, launch browser, interact with UI, confirm expected behavior
-
-**When to use it:**
-
-- Mature ecosystem preference—established Playwright foundation with broad community support
-- Testing-focused workflows—leverages Playwright's end-to-end testing patterns
-- Accessibility-first automation—semantic DOM structure over visual parsing
-
-**Key advantage:** High popularity and mature testing ecosystem. Accessibility tree provides clean, structured text that LLMs interpret reliably without visual processing overhead.
-
-**Installation:**
-
-```bash
-npx @playwright/mcp@latest
-```
-
-Requires Node.js 18+. See [Playwright MCP on GitHub](https://github.com/microsoft/playwright-mcp) for MCP client configuration.
-
-### Chrome DevTools MCP
-
-[Chrome DevTools MCP](https://github.com/ChromeDevTools/chrome-devtools-mcp) is the official browser automation server from the Google Chrome team, purpose-built for MCP workflows with context optimization.
-
-**What it does:** (26+ professional tools)
-
-- Performance analysis—run traces, extract LCP, blocking time, actionable metrics
-- Advanced debugging—analyze network requests (CORS, failed loads), inspect console logs, take DOM snapshots
-- Reliable automation—simulate user interactions (click, type, navigate) via Puppeteer
-- Emulation—CPU throttling, network speed, viewport size for testing under constraints
-
-**When to use it:**
-
-- Performance-focused workflows—deep Chrome DevTools integration for profiling and optimization
-- Context-optimized preference—newer tool designed specifically for MCP agent use cases
-- Chrome-specific features—leverage proprietary DevTools Protocol capabilities
-
-**Key capability:** Closes the "write code → run → verify" loop—agents test their changes in the browser and iterate based on actual behavior.
-
-**Installation:**
-
-```bash
-npx chrome-devtools-mcp@latest
-```
-
-See [Chrome DevTools MCP on GitHub](https://github.com/ChromeDevTools/chrome-devtools-mcp) for MCP client configuration.
-
-### Choosing Between Them
-
-**Playwright MCP:** More popular with broader GitHub community, mature testing ecosystem, established Playwright foundation. Best for standard testing workflows and accessibility-first automation.
-
-**Chrome DevTools MCP:** Newer and purpose-built for MCP, context-optimized by the Chrome team, performance analysis focus. Best for Chrome-specific debugging and profiling workflows.
-
-Both provide comprehensive browser automation with similar scope (~26 tools). The choice depends on ecosystem preference and whether you prioritize maturity (Playwright) or MCP-specific optimization (CDP).
-
-:::tip Run Browser Automation in Sub-Agents
-Browser automation generates high token volumes—DOM snapshots (5,000-15,000 tokens), screenshots (3,000-8,000 tokens), network traces (2,000-10,000 tokens). Multiple operations quickly fill your context window.
-
-**Best practice:** Delegate browser tasks to sub-agents. The sub-agent processes DOM data and screenshots in its isolated context, then returns a concise synthesis: "Button at selector `.submit-btn` clicked, form submitted successfully, redirected to `/dashboard`" (50 tokens instead of 15,000-token DOM dump).
-
-See [Lesson 5: Sub-Agents for Context Isolation](/docs/methodology/lesson-5-grounding#solution-2-sub-agents-for-context-isolation) for architecture details.
+:::note Deprecated: MCP Browser Servers
+Previous recommendations included Playwright MCP and Chrome DevTools MCP. These are now deprecated for agentic workflows—agent-browser's ref-based approach delivers more reliable automation with lower token overhead. The MCP servers remain available for legacy integrations but are not recommended for new projects.
 :::
 
 ---