返回 Skill 列表
extension
分类: 内容与媒体无需 API Key

agent-browser

使用Vercel的agent-browser为AI代理提供的浏览器自动化CLI。这是AI驱动的浏览器自动化的最佳工具——它使用来自可访问性树的确定性引用而不是脆弱的选择器。该工具针对LLM进行了优化,具有快速的Rust CLI、JSON输出和专门构建的AI工作流程。当您需要可靠且可脚本化的浏览器自动化时,请使用此工具。

person作者: jakexiaohubgithub

agent-browser Skill

Browser automation that actually works for AI agents. Built by Vercel Labs specifically for LLM-driven workflows.

Why This Works Better Than Alternatives

1. Deterministic Refs (The Game-Changer)

Problem with traditional tools:

  • CSS selectors break when websites change
  • XPath is brittle and unreadable
  • Coordinate-based clicking fails on responsive layouts
  • Vision-based approaches are slow and expensive

The agent-browser solution:

# 1. Get snapshot with stable refs
agent-browser snapshot -i --json
# Output: - button "Submit" [ref=e2]

# 2. Use that ref forever — it points to the EXACT element
agent-browser click @e2
  • Refs are deterministic@e2 always points to the same element from your snapshot
  • No DOM re-query — direct reference is faster and more reliable
  • AI-optimized — LLMs parse the accessibility tree naturally, not CSS soup

2. Accessibility Trees > Screenshots/HTML

Traditional tools give you raw HTML (noisy) or screenshots (require vision models).

agent-browser gives you the accessibility tree — a clean, semantic representation of what a human (or screen reader) would perceive:

- heading "Billing" [level=1]
- link "Make a payment" [ref=e10]
- button "Submit" [ref=e2]
- textbox "Email" [ref=e3]
  • Semantic roles (button, link, textbox, heading)
  • Human-readable labels
  • Hierarchical structure
  • Perfect for LLM comprehension

3. Built for AI Agents

| Feature | Traditional Tools | agent-browser | |---------|------------------|---------------| | Element targeting | Fragile selectors | Deterministic refs | | Page understanding | Raw HTML | Accessibility tree | | Output format | Text logs | Structured JSON | | Speed | Slow (full browser per command) | Fast (daemon persists) | | AI integration | Afterthought | Purpose-built |

4. Fast Architecture

  • Rust CLI — Native binary, instant command parsing
  • Node.js Daemon — Browser stays warm between commands
  • First command: ~2s (daemon startup)
  • Subsequent commands: ~100ms

Prerequisites

npm install -g agent-browser
agent-browser install  # Download Chromium (~30s)

Core AI Workflow

The workflow designed for LLM agents:

# Step 1: Navigate
agent-browser open https://example.com

# Step 2: Get structured snapshot (the AI "sees" the page)
agent-browser snapshot -i --json

# Step 3: AI picks refs from JSON, execute actions
agent-browser click @e2
agent-browser fill @e3 "test@example.com"

# Step 4: Re-snapshot after changes (state verification)
agent-browser snapshot -i --json

# Step 5: Done
agent-browser close

Commands

Navigation

agent-browser open example.com
agent-browser open example.com --json            # JSON response
agent-browser open example.com --headed          # Visible browser

Snapshot (The Killer Feature)

agent-browser snapshot                           # Full accessibility tree
agent-browser snapshot -i                        # Interactive only (faster)
agent-browser snapshot -i --json                 # JSON for AI parsing
agent-browser snapshot -i -c -d 5 --json         # Compact, depth-limited

Interaction (Using Deterministic Refs)

agent-browser click @e2                          # Click element @e2
agent-browser fill @e3 "text"                    # Fill and clear
agent-browser type @e3 "text"                    # Type without clearing
agent-browser press Enter                        # Press key
agent-browser hover @e4                          # Hover

State Verification

agent-browser get text @e1                       # Get element text
agent-browser get url                            # Current URL
agent-browser is visible @e2                     # Check visibility

Session Management

agent-browser --session login open site.com      # Isolated session
agent-browser --profile ~/.myprofile open site   # Persistent cookies
agent-browser close                              # Clean up

Selector Strategies (Ranked by Reliability)

1. Refs (Best - Use These)

# From snapshot output — deterministic and stable
agent-browser click @e2
agent-browser fill @e3 "text"

2. Semantic Locators (Good)

agent-browser find role button click --name "Submit"
agent-browser find label "Email" fill "test@test.com"

3. CSS Selectors (Okay for static sites)

agent-browser click "#submit"
agent-browser click ".btn-primary"

4. Text/XPath (Last resort)

agent-browser click "text=Submit"
agent-browser click "xpath=//button[1]"

Snapshot Options

Control what the AI "sees":

| Flag | Purpose | |------|---------| | -i | Interactive elements only (buttons, links, inputs) — recommended | | -C | Include cursor-interactive elements (onclick, cursor:pointer) | | -c | Compact (remove empty structural elements) | | -d <n> | Limit tree depth | | -s <sel> | Scope to CSS selector (e.g., #main) | | --json | Machine-readable JSON output — essential for AI |

Recommended AI command:

agent-browser snapshot -i -c --json

Options

| Flag | Description | |------|-------------| | --json | JSON output with success/data/error structure | | --headed | Show browser window (for debugging) | | --session <name> | Isolated browser session | | --profile <path> | Persistent profile for cookies/logins | | --cdp <port> | Connect to existing Chrome via DevTools Protocol | | --headers <json> | Set auth headers per origin |

Example: Complete Login Flow

# Start
agent-browser open https://portal.aeronetpr.com

# Get page structure
SNAPSHOT=$(agent-browser snapshot -i --json)
# AI parses JSON: sees textbox @e1 (Username), textbox @e2 (Password), button @e3 (Login)

# Execute login
agent-browser fill @e1 "username"
agent-browser fill @e2 "password"
agent-browser click @e3

# Verify success (wait for navigation, re-snapshot)
sleep 2
agent-browser snapshot -i --json

# Done
agent-browser close

Tips for AI Agents

  1. Always use --json — Structured output is easier to parse than text
  2. Use -i flag — Interactive-only snapshots are smaller, faster, cleaner
  3. Re-snapshot after actions — Verify state changed as expected
  4. Trust refs over selectors@e2 from snapshot > #id that might change
  5. Use semantic locators when refs expirefind role button click is robust
  6. Session persistence — One open, many commands, one close

Comparison to Other Tools

| Tool | Best For | Why agent-browser Wins | |------|----------|------------------------| | Puppeteer/Playwright | Dev testing | Built for humans; brittle selectors | | Selenium | Legacy testing | Slow, heavy, selector-based | | browser-use | Python agents | agent-browser has better refs system | | Screenshot + Vision | Visual tasks | agent-browser is 10x faster, 100x cheaper | | OpenClaw browser tool | Simple tasks | agent-browser handles complex flows better |

When to Use This Skill

Use agent-browser when:

  • Automating multi-step web workflows
  • Filling complex forms
  • Need reliable, repeatable automation
  • Working with dynamic/modern web apps
  • Cost matters (no vision API calls)

Use OpenClaw's built-in browser tool when:

  • Simple single-page checks
  • Quick screenshot needed
  • Already authenticated session in Chrome

Resources

  • Vercel Labs repo: https://github.com/vercel-labs/agent-browser
  • This skill repo: https://github.com/clawdbrunner/skill-agent-browser