介绍
# Stealth Browser Automation
静默、不可检测的 Web 自动化,结合多层反检测机制。
## 快速登录流程 (重要)
当用户要求登录任何网站时:
1. **在有头模式 (headed mode) 下打开**(可见浏览器,用于手动登录): ```bash python scripts/stealth_session.py -u "https://target.com/login" -s sitename --headed ```
2. **用户手动登录**在可见浏览器中
3. **保存会话**登录确认后: ```bash python scripts/stealth_session.py -u "https://target.com" -s sitename --headed --save ```
4. **未来使用** - 加载保存的会话(无头模式/headless): ```bash python scripts/stealth_session.py -u "https://target.com" -s sitename --load ```
会话存储位置:`~/.clawdbot/browser-sessions/<sitename>.json`
## 执行策略 (重要)
### 1. 先静默后显示 - 优先使用 headless 模式静默尝试 - 如果失败或需要验证码,再切换到 headed 显示模式 - 避免打扰用户操作
### 2. 断点续传 长任务使用 `task_runner.py` 管理状态: ```python from task_runner import TaskRunner task = TaskRunner('my_task') task.set_total(100) for i in items: if task.is_completed(i): continue # 跳过已完成 # 处理... task.mark_completed(i) task.finish() ```
### 3. 超时处理 - 默认单页超时: 30秒 - 长任务每50项保存一次进度 - 失败自动重试3次
### 4. 记录尝试 所有登录尝试记录在: `~/.clawdbot/browser-sessions/attempts.json`
## 架构
``` ┌─────────────────────────────────────────────────────┐ │ Stealth Browser │ ├─────────────────────────────────────────────────────┤ │ Layer 1: Anti-Detection Engine │ │ - puppeteer-extra-plugin-stealth │ │ - Browser fingerprint spoofing │ │ - WebGL/Canvas/Audio fingerprint masking │ ├─────────────────────────────────────────────────────┤ │ Layer 2: Challenge Bypass │ │ - Cloudflare Turnstile/JS Challenge │ │ - hCaptcha / reCAPTCHA integration │ │ - 2Captcha / Anti-Captcha API │ ├─────────────────────────────────────────────────────┤ │ Layer 3: Session Persistence │ │ - Cookie storage (JSON/SQLite) │ │ - localStorage sync │ │ - Multi-profile management │ ├─────────────────────────────────────────────────────┤ │ Layer 4: Proxy & Identity │ │ - Rotating residential proxies │ │ - User-Agent rotation │ │ - Timezone/Locale spoofing │ └─────────────────────────────────────────────────────┘ ```
## 安装
### 安装核心依赖
```bash npm install -g puppeteer-extra puppeteer-extra-plugin-stealth npm install -g playwright pip install undetected-chromedriver DrissionPage ```
### 可选:验证码 (CAPTCHA) 解决器
将 API 密钥存储在 `~/.clawdbot/secrets/captcha.json`: ```json { "2captcha": "YOUR_2CAPTCHA_KEY", "anticaptcha": "YOUR_ANTICAPTCHA_KEY", "capsolver": "YOUR_CAPSOLVER_KEY" } ```
### 可选:代理配置
存储在 `~/.clawdbot/secrets/proxies.json`: ```json { "rotating": "http://user:[email protected]:port", "residential": ["socks5://ip1:port", "socks5://ip2:port"], "datacenter": "http://dc-proxy:port" } ```
## 快速开始
### 1. 隐身会话 (Python - 推荐)
```python # scripts/stealth_session.py - use for maximum compatibility import undetected_chromedriver as uc from DrissionPage import ChromiumPage
# Option A: undetected-chromedriver (Selenium-based) driver = uc.Chrome(headless=True, use_subprocess=True) driver.get("https://nowsecure.nl") # Test anti-detection
# Option B: DrissionPage (faster, native Python) page = ChromiumPage() page.get("https://cloudflare-protected-site.com") ```
### 2. 隐身会话 (Node.js)
```javascript // scripts/stealth.mjs import puppeteer from 'puppeteer-extra'; import StealthPlugin from 'puppeteer-extra-plugin-stealth';
puppeteer.use(StealthPlugin());
const browser = await puppeteer.launch({ headless: 'new', args: [ '--disable-blink-features=AutomationControlled', '--disable-dev-shm-usage', '--no-sandbox' ] });
const page = await browser.newPage(); await page.goto('https://bot.sannysoft.com'); // Verify stealth ```
## 核心操作
### 打开隐身页面
```bash # Using agent-browser with stealth profile agent-browser --profile ~/.stealth-profile open https://target.com
# Or via script python scripts/stealth_open.py --url "https://target.com" --headless ```
### 绕过 Cloudflare
```python # Automatic CF bypass with DrissionPage from DrissionPage import ChromiumPage
page = ChromiumPage() page.get("https://cloudflare-site.com") # DrissionPage waits for CF challenge automatically
# Manual wait if needed page.wait.ele_displayed("main-content", timeout=30) ```
对于顽固的 Cloudflare 网站,使用 FlareSolverr:
```bash # Start FlareSolverr container docker run -d --name flaresolverr -p 8191:8191 ghcr.io/flaresolverr/flaresolverr
# Request clearance curl -X POST http://localhost:8191/v1 \ -H "Content-Type: application/json" \ -d '{"cmd":"request.get","url":"https://cf-protected.com","maxTimeout":60000}' ```
### 解决验证码 (CAPTCHAs)
```python # scripts/solve_captcha.py import requests import json import time
def solve_recaptcha(site_key, page_url, api_key): """Solve reCAPTCHA v2/v3 via 2Captcha""" # Submit task resp = requests.post("http://2captcha.com/in.php", data={ "key": api_key, "method": "userrecaptcha", "googlekey": site_key, "pageurl": page_url, "json": 1 }).json() task_id = resp["request"] # Poll for result for _ in range(60): time.sleep(3) result = requests.get(f"http://2captcha.com/res.php?key={api_key}&action=get&id={task_id}&json=1").json() if result["status"] == 1: return result["request"] # Token return None
def solve_hcaptcha(site_key, page_url, api_key): """Solve hCaptcha via Anti-Captcha""" resp = requests.post("https://api.anti-captcha.com/createTask", json={ "clientKey": api_key, "task": { "type": "HCaptchaTaskProxyless", "websiteURL": page_url, "websiteKey": site_key } }).json() task_id = resp["taskId"] for _ in range(60): time.sleep(3) result = requests.post("https://api.anti-captcha.com/getTaskResult", json={ "clientKey": api_key, "taskId": task_id }).json() if result["status"] == "ready": return result["solution"]["gRecaptchaResponse"] return None ```
### 持久化会话
```python # scripts/session_manager.py import json import os from pathlib import Path
SESSIONS_DIR = Path.home() / ".clawdbot" / "browser-sessions" SESSIONS_DIR.mkdir(parents=True, exist_ok=True)
def save_cookies(driver, session_name): """Save cookies to JSON""" cookies = driver.get_cookies() path = SESSIONS_DIR / f"{session_name}_cookies.json" path.write_text(json.dumps(cookies, indent=2)) return path
def load_cookies(driver, session_name): """Load cookies from saved session""" path = SESSIONS_DIR / f"{session_name}_cookies.json" if path.exists(): cookies = json.loads(path.read_text()) for cookie in cookies: driver.add_cookie(cookie) return True return False
def save_local_storage(page, session_name): """Save localStorage""" ls = page.evaluate("() => JSON.stringify(localStorage)") path = SESSIONS_DIR / f"{session_name}_localStorage.json" path.write_text(ls) return path
def load_local_storage(page, session_name): """Restore localStorage""" path = SESSIONS_DIR / f"{session_name}_localStorage.json" if path.exists(): data = path.read_text() page.evaluate(f"(data) => {{ Object.entries(JSON.parse(data)).forEach(([k,v]) => localStorage.setItem(k,v)) }}", data) return True return False ```
### 静默自动化工作流
```python # Complete silent automation example from DrissionPage import ChromiumPage, ChromiumOptions
# Configure for stealth options = ChromiumOptions() options.headless() options.set_argument('--disable-blink-features=AutomationControlled') options.set_argument('--disable-dev-shm-usage') options.set_user_agent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36')
page = ChromiumPage(options)
# Navigate with CF bypass page.get("https://target-site.com")
# Wait for any challenges page.wait.doc_loaded()
# Interact silently page.ele("@id=username").input("[email protected]") page.ele("@id=password").input("password123") page.ele("@type=submit").click()
# Save session for reuse page.cookies.save("~/.clawdbot/browser-sessions/target-site.json") ```
## 代理轮换
```python # scripts/proxy_rotate.py import random import json from pathlib import Path
def get_proxy(): """Get random proxy from pool""" config = json.loads((Path.home() / ".clawdbot/secrets/proxies.json").read_text()) proxies = config.get("residential", []) return random.choice(proxies) if proxies else config.get("rotating")
# Use with DrissionPage options = ChromiumOptions() options.set_proxy(get_proxy()) page = ChromiumPage(options) ```
## 需要用户输入
要完成此技能,请提供:
1. **验证码 (CAPTCHA) API 密钥**(可选但推荐): - 2Captcha 密钥:https://2captcha.com - Anti-Captcha 密钥:https://anti-captcha.com - CapSolver 密钥:https://capsolver.com
2. **代理配置**(可选): - 住宅代理提供商凭据 - 或 SOCKS5/HTTP 代理列表
3. **目标站点**(用于预配置会话): - 哪些站点需要保持登录持久化? - 应存储哪些凭据?
## 文件结构
``` stealth-browser/ ├── SKILL.md ├── scripts/ │ ├── stealth_session.py # Main stealth browser wrapper │ ├── solve_captcha.py # CAPTCHA solving utilities │ ├── session_manager.py # Cookie/localStorage persistence │ ├── proxy_rotate.py # Proxy rotation │ └── cf_bypass.py # Cloudflare-specific bypass └── references/ ├── fingerprints.md # Browser fingerprint details └── detection-tests.md # Sites to test anti-detection ```
## 测试反检测
```bash # Run these to verify stealth is working: python scripts/stealth_open.py --url "https://bot.sannysoft.com" python scripts/stealth_open.py --url "https://nowsecure.nl" python scripts/stealth_open.py --url "https://arh.antoinevastel.com/bots/areyouheadless" python scripts/stealth_open.py --url "https://pixelscan.net" ```
## 与 agent-browser 集成
对于简单任务,请使用带有持久化配置文件的 agent-browser:
```bash # Create stealth profile once agent-browser --profile ~/.stealth-profile --headed open https://login-site.com # Login manually, then close
# Reuse authenticated session (headless) agent-browser --profile ~/.stealth-profile snapshot agent-browser --profile ~/.stealth-profile click @e5 ```
对于 Cloudflare 或验证码较多的站点,请改用 Python 脚本。
## 最佳实践
1. **始终使用 `headless: 'new'`** 而不是 `headless: true`(更难检测) 2. **轮换 User-Agents** 以匹配浏览器版本 3. **在操作之间添加随机延迟**(100-500ms) 4. **针对敏感目标使用住宅代理** 5. **成功登录后保存会话** 6. **在生产使用前在 bot.sannysoft.com 上测试**