ClawSkills logoClawSkills

Browser Use

自动化浏览器交互,用于网页测试、表单填充、屏幕截图和数据提取。当用户需要浏览网站、与 w 交互时使用...

介绍

# Browser Automation with browser-use CLI

`browser-use` 命令提供快速、持久的浏览器自动化功能。它在命令之间维护浏览器会话,从而实现复杂的多步骤工作流。

## 前置条件

在使用此功能之前,必须安装并配置 `browser-use`。运行诊断以进行验证:

```bash browser-use doctor ```

更多信息,请参阅 https://github.com/browser-use/browser-use/blob/main/browser_use/skill_cli/README.md

## 核心工作流

1. **导航**:`browser-use open <url>` - 打开 URL(如需要则启动浏览器) 2. **检查**:`browser-use state` - 返回带有索引的可点击元素 3. **交互**:使用状态中的索引进行交互(`browser-use click 5`,`browser-use input 3 "text"`) 4. **验证**:`browser-use state` 或 `browser-use screenshot` 以确认操作 5. **重复**:浏览器在命令之间保持打开状态

## 浏览器模式

```bash browser-use --browser chromium open <url> # Default: headless Chromium browser-use --browser chromium --headed open <url> # Visible Chromium window browser-use --browser real open <url> # Real Chrome (no profile = fresh) browser-use --browser real --profile "Default" open <url> # Real Chrome with your login sessions browser-use --browser remote open <url> # Cloud browser ```

- **chromium**:快速、隔离、默认无头模式 - **real**:使用真实的 Chrome 二进制文件。如果不带 `--profile`,则使用位于 `~/.config/browseruse/profiles/cli/` 的持久但空的 CLI 配置文件。如果使用 `--profile "ProfileName"`,则复制你实际的 Chrome 配置文件(cookies、登录信息、扩展程序) - **remote**:支持代理的云托管浏览器

## 基本命令

```bash # Navigation browser-use open <url> # Navigate to URL browser-use back # Go back browser-use scroll down # Scroll down (--amount N for pixels)

# Page State (always run state first to get element indices) browser-use state # Get URL, title, clickable elements browser-use screenshot # Take screenshot (base64) browser-use screenshot path.png # Save screenshot to file

# Interactions (use indices from state) browser-use click <index> # Click element browser-use type "text" # Type into focused element browser-use input <index> "text" # Click element, then type browser-use keys "Enter" # Send keyboard keys browser-use select <index> "option" # Select dropdown option

# Data Extraction browser-use eval "document.title" # Execute JavaScript browser-use get text <index> # Get element text browser-use get html --selector "h1" # Get scoped HTML

# Wait browser-use wait selector "h1" # Wait for element browser-use wait text "Success" # Wait for text

# Session browser-use sessions # List active sessions browser-use close # Close current session browser-use close --all # Close all sessions

# AI Agent browser-use -b remote run "task" # Run agent in cloud (async by default) browser-use task status <id> # Check cloud task progress ```

## 命令

### 导航与标签页 ```bash browser-use open <url> # Navigate to URL browser-use back # Go back in history browser-use scroll down # Scroll down browser-use scroll up # Scroll up browser-use scroll down --amount 1000 # Scroll by specific pixels (default: 500) browser-use switch <tab> # Switch to tab by index browser-use close-tab # Close current tab browser-use close-tab <tab> # Close specific tab ```

### 页面状态 ```bash browser-use state # Get URL, title, and clickable elements browser-use screenshot # Take screenshot (outputs base64) browser-use screenshot path.png # Save screenshot to file browser-use screenshot --full path.png # Full page screenshot ```

### 交互 ```bash browser-use click <index> # Click element browser-use type "text" # Type text into focused element browser-use input <index> "text" # Click element, then type text browser-use keys "Enter" # Send keyboard keys browser-use keys "Control+a" # Send key combination browser-use select <index> "option" # Select dropdown option browser-use hover <index> # Hover over element (triggers CSS :hover) browser-use dblclick <index> # Double-click element browser-use rightclick <index> # Right-click element (context menu) ```

使用来自 `browser-use state` 的索引。

### JavaScript 与数据 ```bash browser-use eval "document.title" # Execute JavaScript, return result browser-use get title # Get page title browser-use get html # Get full page HTML browser-use get html --selector "h1" # Get HTML of specific element browser-use get text <index> # Get text content of element browser-use get value <index> # Get value of input/textarea browser-use get attributes <index> # Get all attributes of element browser-use get bbox <index> # Get bounding box (x, y, width, height) ```

### Cookies ```bash browser-use cookies get # Get all cookies browser-use cookies get --url <url> # Get cookies for specific URL browser-use cookies set <name> <value> # Set a cookie browser-use cookies set name val --domain .example.com --secure --http-only browser-use cookies set name val --same-site Strict # SameSite: Strict, Lax, or None browser-use cookies set name val --expires 1735689600 # Expiration timestamp browser-use cookies clear # Clear all cookies browser-use cookies clear --url <url> # Clear cookies for specific URL browser-use cookies export <file> # Export all cookies to JSON file browser-use cookies export <file> --url <url> # Export cookies for specific URL browser-use cookies import <file> # Import cookies from JSON file ```

### 等待条件 ```bash browser-use wait selector "h1" # Wait for element to be visible browser-use wait selector ".loading" --state hidden # Wait for element to disappear browser-use wait selector "#btn" --state attached # Wait for element in DOM browser-use wait text "Success" # Wait for text to appear browser-use wait selector "h1" --timeout 5000 # Custom timeout in ms ```

### Python 执行 ```bash browser-use python "x = 42" # Set variable browser-use python "print(x)" # Access variable (outputs: 42) browser-use python "print(browser.url)" # Access browser object browser-use python --vars # Show defined variables browser-use python --reset # Clear Python namespace browser-use python --file script.py # Execute Python file ```

Python 会话在命令之间保持状态。`browser` 对象提供: - `browser.url`、`browser.title`、`browser.html` — 页面信息 - `browser.goto(url)`、`browser.back()` — 导航 - `browser.click(index)`、`browser.type(text)`、`browser.input(index, text)`、`browser.keys(keys)` — 交互 - `browser.screenshot(path)`、`browser.scroll(direction, amount)` — 视觉操作 - `browser.wait(seconds)`、`browser.extract(query)` — 工具函数

### Agent 任务

#### 远程模式选项

使用 `--browser remote` 时,可以使用附加选项:

```bash # Specify LLM model browser-use -b remote run "task" --llm gpt-4o browser-use -b remote run "task" --llm claude-sonnet-4-20250514

# Proxy configuration (default: us) browser-use -b remote run "task" --proxy-country uk

# Session reuse browser-use -b remote run "task 1" --keep-alive # Keep session alive after task browser-use -b remote run "task 2" --session-id abc-123 # Reuse existing session

# Execution modes browser-use -b remote run "task" --flash # Fast execution mode browser-use -b remote run "task" --wait # Wait for completion (default: async)

# Advanced options browser-use -b remote run "task" --thinking # Extended reasoning mode browser-use -b remote run "task" --no-vision # Disable vision (enabled by default)

# Using a cloud profile (create session first, then run with --session-id) browser-use session create --profile <cloud-profile-id> --keep-alive # → returns session_id browser-use -b remote run "task" --session-id <session-id>

# Task configuration browser-use -b remote run "task" --start-url https://example.com # Start from specific URL browser-use -b remote run "task" --allowed-domain example.com # Restrict navigation (repeatable) browser-use -b remote run "task" --metadata key=value # Task metadata (repeatable) browser-use -b remote run "task" --skill-id skill-123 # Enable skills (repeatable) browser-use -b remote run "task" --secret key=value # Secret metadata (repeatable)

# Structured output and evaluation browser-use -b remote run "task" --structured-output '{"type":"object"}' # JSON schema for output browser-use -b remote run "task" --judge # Enable judge mode browser-use -b remote run "task" --judge-ground-truth "expected answer" ```

### 任务管理 ```bash browser-use task list # List recent tasks browser-use task list --limit 20 # Show more tasks browser-use task list --status finished # Filter by status (finished, stopped) browser-use task list --session <id> # Filter by session ID browser-use task list --json # JSON output

browser-use task status <task-id> # Get task status (latest step only) browser-use task status <task-id> -c # All steps with reasoning browser-use task status <task-id> -v # All steps with URLs + actions browser-use task status <task-id> --last 5 # Last N steps only browser-use task status <task-id> --step 3 # Specific step number browser-use task status <task-id> --reverse # Newest first

browser-use task stop <task-id> # Stop a running task browser-use task logs <task-id> # Get task execution logs ```

### 云会话管理 ```bash browser-use session list # List cloud sessions browser-use session list --limit 20 # Show more sessions browser-use session list --status active # Filter by status browser-use session list --json # JSON output

browser-use session get <session-id> # Get session details + live URL browser-use session get <session-id> --json

browser-use session stop <session-id> # Stop a session browser-use session stop --all # Stop all active sessions

browser-use session create # Create with defaults browser-use session create --profile <id> # With cloud profile browser-use session create --proxy-country uk # With geographic proxy browser-use session create --start-url https://example.com browser-use session create --screen-size 1920x1080 browser-use session create --keep-alive browser-use session create --persist-memory

browser-use session share <session-id> # Create public share URL browser-use session share <session-id> --delete # Delete public share ```

### 隧道 ```bash browser-use tunnel <port> # Start tunnel (returns URL) browser-use tunnel <port> # Idempotent - returns existing URL browser-use tunnel list # Show active tunnels browser-use tunnel stop <port> # Stop tunnel browser-use tunnel stop --all # Stop all tunnels ```

### 会话管理 ```bash browser-use sessions # List active sessions browser-use close # Close current session browser-use close --all # Close all sessions ```

### 配置文件管理

#### 本地 Chrome 配置文件(`--browser real`) ```bash browser-use -b real profile list # List local Chrome profiles browser-use -b real profile cookies "Default" # Show cookie domains in profile ```

#### 云配置文件(`--browser remote`) ```bash browser-use -b remote profile list # List cloud profiles browser-use -b remote profile list --page 2 --page-size 50 browser-use -b remote profile get <id> # Get profile details browser-use -b remote profile create # Create new cloud profile browser-use -b remote profile create --name "My Profile" browser-use -b remote profile update <id> --name "New" browser-use -b remote profile delete <id> ```

#### 同步 ```bash browser-use profile sync --from "Default" --domain github.com # Domain-specific browser-use profile sync --from "Default" # Full profile browser-use profile sync --from "Default" --name "Custom Name" # With custom name ```

### 服务器控制 ```bash browser-use server logs # View server logs ```

## 常见工作流

### 暴露本地开发服务器

当你有一个本地开发服务器并需要云浏览器访问它时使用。

**核心工作流**:启动开发服务器 → 创建隧道 → 远程浏览隧道 URL。

```bash # 1. Start your dev server npm run dev & # localhost:3000

# 2. Expose it via Cloudflare tunnel browser-use tunnel 3000 # → url: https://abc.trycloudflare.com

# 3. Now the cloud browser can reach your local server browser-use --browser remote open https://abc.trycloudflare.com browser-use state browser-use screenshot ```

**注意**:隧道独立于浏览器会话。它们在 `browser-use close` 之后仍然存在,可以单独管理。必须安装 Cloudflared — 运行 `browser-use doctor` 进行检查。

### 使用配置文件进行身份验证浏览

当任务需要浏览用户已登录的站点(例如 Gmail、GitHub、内部工具)时使用。

**核心工作流**:检查现有配置文件 → 询问用户使用哪个配置文件和浏览器模式 → 使用该配置文件浏览。仅当没有合适的配置文件存在时才同步 cookies。

**在浏览需要身份验证的站点之前,Agent 必须:** 1. 询问用户是使用 **real**(本地 Chrome)还是 **remote**(云)浏览器 2. 列出该模式的可用配置文件 3. 询问使用哪个配置文件 4. 如果没有配置文件具有正确的 cookies,则提议进行同步(见下文)

#### 步骤 1:检查现有配置文件

```bash # Option A: Local Chrome profiles (--browser real) browser-use -b real profile list # → Default: Person 1 ([email protected]) # → Profile 1: Work ([email protected])

# Option B: Cloud profiles (--browser remote) browser-use -b remote profile list # → abc-123: "Chrome - Default (github.com)" # → def-456: "Work profile" ```

#### 步骤 2:使用选定的配置文件浏览

```bash # Real browser — uses local Chrome with existing login sessions browser-use --browser real --profile "Default" open https://github.com

# Cloud browser — uses cloud profile with synced cookies browser-use --browser remote --profile abc-123 open https://github.com ```

用户已经过身份验证 — 无需登录。

**注意**:云配置文件的 cookies 可能会随时间过期。如果身份验证失败,请从本地 Chrome 配置文件重新同步 cookies。

#### 步骤 3:同步 cookies(仅在需要时)

如果用户想要使用云浏览器但没有云配置文件具有正确的 cookies,请从本地 Chrome 配置文件同步它们。

**在同步之前,Agent 必须:** 1. 询问使用哪个本地 Chrome 配置文件 2. 询问同步哪些域名 — 请勿默认同步整个配置文件 3. 继续之前进行确认

**检查本地配置文件有哪些 cookies:** ```bash browser-use -b real profile cookies "Default" # → youtube.com: 23 # → google.com: 18 # → github.com: 2 ```

**特定域名同步(推荐):** ```bash browser-use profile sync --from "Default" --domain github.com # Creates new cloud profile: "Chrome - Default (github.com)" # Only syncs github.com cookies ```

**完整配置文件同步(请谨慎使用):** ```bash browser-use profile sync --from "Default" # Syncs ALL cookies — includes sensitive data, tracking cookies, every session token ```

仅当用户明确需要其整个浏览器状态时使用。

**细粒度控制(高级):** ```bash # Export cookies to file, manually edit, then import browser-use --browser real --profile "Default" cookies export /tmp/cookies.json browser-use --browser remote --profile <id> cookies import /tmp/cookies.json ```

**使用同步的配置文件:** ```bash browser-use --browser remote --profile <id> open https://github.com ```

### 运行子 Agent

使用云会话并行运行自主浏览器 Agent。

**核心工作流**:使用 `run` 启动任务 → 使用 `task status` 轮询 → 收集结果 → 清理会话。

- **会话 = Agent**:每个云会话都是具有自己状态的浏览器 Agent - **任务 = 工作**:分配给 Agent 的工作;一个 Agent 可以按顺序运行多个任务 - **会话生命周期**:一旦停止,会话无法恢复 — 需要启动一个新的

#### 启动任务

```bash # Single task (async by default — returns immediately) browser-use -b remote run "Search for AI news and summarize top 3 articles" # → task_id: task-abc, session_id: sess-123

# Parallel tasks — each gets its own session browser-use -b remote run "Research competitor A pricing" # → task_id: task-1, session_id: sess-a browser-use -b remote run "Research competitor B pricing" # → task_id: task-2, session_id: sess-b browser-use -b remote run "Research competitor C pricing" # → task_id: task-3, session_id: sess-c

# Sequential tasks in same session (reuses cookies, login state, etc.) browser-use -b remote run "Log into example.com" --keep-alive # → task_id: task-1, session_id: sess-123 browser-use task status task-1 # Wait for completion browser-use -b remote run "Export settings" --session-id sess-123 # → task_id: task-2, session_id: sess-123 (same session) ```

#### 管理与停止

```bash browser-use task list --status finished # See completed tasks browser-use task stop task-abc # Stop a task (session may continue if --keep-alive) browser-use session stop sess-123 # Stop an entire session (terminates its tasks) browser-use session stop --all # Stop all sessions ```

#### 监控

**任务状态设计旨在节省 Token。** 默认输出最少 — 仅在需要时展开:

| 模式 | 标志 | Token 数量 | 使用场景 | |------|------|--------|----------| | 默认 | (无) | 低 | 轮询进度 | | 紧凑 | `-c` | 中 | 需要完整推理 | | 详细 | `-v` | 高 | 调试操作 |

```bash # For long tasks (50+ steps) browser-use task status <id> -c --last 5 # Last 5 steps only browser-use task status <id> -v --step 10 # Inspect specific step ```

**实时视图**:`browser-use session get <session-id>` 返回一个实时 URL 以观察 Agent。

**检测卡住的任务**:如果 `task status` 中的成本/持续时间停止增加,则任务卡住了 — 停止它并启动一个新的 Agent。

**日志**:`browser-use task logs <task-id>` — 仅在任务完成后可用。

## 全局选项

| 选项 | 描述 | |--------|-------------| | `--session NAME` | 使用命名的会话(默认:“default”) | | `--browser MODE` | 浏览器模式:chromium、real、remote | | `--headed` | 显示浏览器窗口(chromium 模式) | | `--profile NAME` | 浏览器配置文件(本地名称或云 ID)。适用于 `open`、`session create` 等 — 不适用于 `run`(请改用 `--session-id`) | | `--json` | 输出为 JSON 格式 | | `--mcp` | 通过 stdin/stdout 作为 MCP 服务器运行 |

**会话行为**:所有不带 `--session` 的命令都使用同一个“default”会话。浏览器保持打开并在命令之间重用。使用 `--session NAME` 并行运行多个浏览器。

## 提示

1. **始终先运行 `browser-use state`** 以查看可用元素及其索引 2. **使用 `--headed` 进行调试** 以查看浏览器正在做什么 3. **会话持久化** — 浏览器在命令之间保持打开状态 4. **使用 `--json`** 以进行程序化解析 5. **Python 变量持久化** — 在会话内的 `browser-use python` 命令之间保持 6. **CLI 别名**:`bu`、`browser` 和 `browseruse` 的行为与 `browser-use` 完全相同

## 故障排除

**首先运行诊断:** ```bash browser-use doctor ```

**浏览器无法启动?** ```bash browser-use close --all # Close all sessions browser-use --headed open <url> # Try with visible window ```

**找不到元素?** ```bash browser-use state # Check current elements browser-use scroll down # Element might be below fold browser-use state # Check again ```

**会话问题?** ```bash browser-use sessions # Check active sessions browser-use close --all # Clean slate browser-use open <url> # Fresh start ```

**`task stop` 后会话重用失败**: 如果你停止任务并尝试重用其会话,新任务可能会卡在“created”状态。请改为创建一个新会话: ```bash browser-use session create --profile <profile-id> --keep-alive browser-use -b remote run "new task" --session-id <new-session-id> ```

**任务卡在“started”**:使用 `task status` 检查成本 — 如果没有增加,则任务卡住了。使用 `session get` 查看实时 URL,然后停止并启动一个新的 Agent。

**任务完成后会话仍然存在**:任务完成不会自动停止会话。运行 `browser-use session stop --all` 进行清理。

## 清理

**完成后务必关闭浏览器:**

```bash browser-use close # Close browser session browser-use session stop --all # Stop cloud sessions (if any) browser-use tunnel stop --all # Stop tunnels (if any) ```

更多产品