ClawSkills logoClawSkills

YouTube Apify Transcript

通过 APIFY API 获取 YouTube 转录文本。通过绕过 YouTube 的机器人检测,支持从云端 IP(Hetzner、AWS 等)访问。免费套餐包含 $5/月的额度(约

介绍

# youtube-apify-transcript

通过 APIFY API 获取 YouTube 字幕(从云端 IP 运行,可绕过 YouTube 机器人检测)。

## 为什么选择 APIFY?

YouTube 会拦截来自云端 IP(AWS、GCP 等)的字幕请求。APIFY 通过住宅代理运行请求,可靠地绕过机器人检测。

## 免费套餐

- **每月 5 美元免费额度**(约 714 个视频) - 无需信用卡 - 非常适合个人使用

## 费用

- **每个视频 0.007 美元**(不到 1 美分!) - 在此处追踪使用情况:https://console.apify.com/billing

## 链接

- 🔗 [APIFY 定价](https://apify.com/pricing) - 🔑 [获取 API 密钥](https://console.apify.com/account/integrations) - 🎬 [YouTube 字幕 Actor](https://apify.com/karamelo/youtube-transcripts)

## 设置

1. 创建免费的 APIFY 账户:https://apify.com/ 2. 获取您的 API 令牌:https://console.apify.com/account/integrations 3. 设置环境变量:

```bash # Add to ~/.bashrc or ~/.zshrc export APIFY_API_TOKEN="apify_api_YOUR_TOKEN_HERE"

# Or use .env file (never commit this!) echo 'APIFY_API_TOKEN=apify_api_YOUR_TOKEN_HERE' >> .env ```

## 用法

### 基本用法

```bash # Get transcript as text (uses cache by default) python3 scripts/fetch_transcript.py "https://www.youtube.com/watch?v=VIDEO_ID"

# Short URL also works python3 scripts/fetch_transcript.py "https://youtu.be/VIDEO_ID" ```

### 选项

```bash # Output to file python3 scripts/fetch_transcript.py "URL" --output transcript.txt

# JSON format (includes timestamps) python3 scripts/fetch_transcript.py "URL" --json

# Both: JSON to file python3 scripts/fetch_transcript.py "URL" --json --output transcript.json

# Specify language preference python3 scripts/fetch_transcript.py "URL" --lang de ```

### 缓存(节省费用!)

默认情况下,字幕会在本地缓存。对同一视频的重复请求费用为 $0。

```bash # First request: fetches from APIFY ($0.007) python3 scripts/fetch_transcript.py "URL"

# Second request: uses cache (FREE!) python3 scripts/fetch_transcript.py "URL" # Output: [cached] Transcript for: VIDEO_ID

# Bypass cache (force fresh fetch) python3 scripts/fetch_transcript.py "URL" --no-cache

# View cache stats python3 scripts/fetch_transcript.py --cache-stats

# Clear all cached transcripts python3 scripts/fetch_transcript.py --clear-cache ```

缓存位置:技能目录中的 `.cache/`(可通过 `YT_TRANSCRIPT_CACHE_DIR` 环境变量覆盖)

### 批量模式

一次处理多个视频:

```bash # Create a file with URLs (one per line) cat > urls.txt << EOF https://youtube.com/watch?v=VIDEO1 https://youtu.be/VIDEO2 https://youtube.com/watch?v=VIDEO3 EOF

# Process all URLs python3 scripts/fetch_transcript.py --batch urls.txt

# Output: # [1/3] Fetching VIDEO1... # [2/3] [cached] VIDEO2 # [3/3] Fetching VIDEO3... # Batch complete: 2 fetched, 1 cached, 0 failed # [Cost: ~$0.014 for 2 API call(s)]

# Batch with JSON output to file python3 scripts/fetch_transcript.py --batch urls.txt --json --output all_transcripts.json ```

### 输出格式

**文本(默认):** ``` Hello and welcome to this video. Today we're going to talk about... ```

**JSON (--json):** ```json { "video_id": "dQw4w9WgXcQ", "title": "Video Title", "transcript": [ {"start": 0.0, "duration": 2.5, "text": "Hello and welcome"}, {"start": 2.5, "duration": 3.0, "text": "to this video"} ], "full_text": "Hello and welcome to this video..." } ```

## 错误处理

该脚本处理常见错误: - 无效的 YouTube URL - 视频没有字幕 - API 配额超出 - 网络错误

## 元数据

```yaml metadata: clawdbot: emoji: "📹" requires: env: ["APIFY_API_TOKEN"] bins: ["python3"] ```

更多产品