介绍
# youtube-apify-transcript
通过 APIFY API 获取 YouTube 字幕(从云端 IP 运行,可绕过 YouTube 机器人检测)。
## 为什么选择 APIFY?
YouTube 会拦截来自云端 IP(AWS、GCP 等)的字幕请求。APIFY 通过住宅代理运行请求,可靠地绕过机器人检测。
## 免费套餐
- **每月 5 美元免费额度**(约 714 个视频) - 无需信用卡 - 非常适合个人使用
## 费用
- **每个视频 0.007 美元**(不到 1 美分!) - 在此处追踪使用情况:https://console.apify.com/billing
## 链接
- 🔗 [APIFY 定价](https://apify.com/pricing) - 🔑 [获取 API 密钥](https://console.apify.com/account/integrations) - 🎬 [YouTube 字幕 Actor](https://apify.com/karamelo/youtube-transcripts)
## 设置
1. 创建免费的 APIFY 账户:https://apify.com/ 2. 获取您的 API 令牌:https://console.apify.com/account/integrations 3. 设置环境变量:
```bash # Add to ~/.bashrc or ~/.zshrc export APIFY_API_TOKEN="apify_api_YOUR_TOKEN_HERE"
# Or use .env file (never commit this!) echo 'APIFY_API_TOKEN=apify_api_YOUR_TOKEN_HERE' >> .env ```
## 用法
### 基本用法
```bash # Get transcript as text (uses cache by default) python3 scripts/fetch_transcript.py "https://www.youtube.com/watch?v=VIDEO_ID"
# Short URL also works python3 scripts/fetch_transcript.py "https://youtu.be/VIDEO_ID" ```
### 选项
```bash # Output to file python3 scripts/fetch_transcript.py "URL" --output transcript.txt
# JSON format (includes timestamps) python3 scripts/fetch_transcript.py "URL" --json
# Both: JSON to file python3 scripts/fetch_transcript.py "URL" --json --output transcript.json
# Specify language preference python3 scripts/fetch_transcript.py "URL" --lang de ```
### 缓存(节省费用!)
默认情况下,字幕会在本地缓存。对同一视频的重复请求费用为 $0。
```bash # First request: fetches from APIFY ($0.007) python3 scripts/fetch_transcript.py "URL"
# Second request: uses cache (FREE!) python3 scripts/fetch_transcript.py "URL" # Output: [cached] Transcript for: VIDEO_ID
# Bypass cache (force fresh fetch) python3 scripts/fetch_transcript.py "URL" --no-cache
# View cache stats python3 scripts/fetch_transcript.py --cache-stats
# Clear all cached transcripts python3 scripts/fetch_transcript.py --clear-cache ```
缓存位置:技能目录中的 `.cache/`(可通过 `YT_TRANSCRIPT_CACHE_DIR` 环境变量覆盖)
### 批量模式
一次处理多个视频:
```bash # Create a file with URLs (one per line) cat > urls.txt << EOF https://youtube.com/watch?v=VIDEO1 https://youtu.be/VIDEO2 https://youtube.com/watch?v=VIDEO3 EOF
# Process all URLs python3 scripts/fetch_transcript.py --batch urls.txt
# Output: # [1/3] Fetching VIDEO1... # [2/3] [cached] VIDEO2 # [3/3] Fetching VIDEO3... # Batch complete: 2 fetched, 1 cached, 0 failed # [Cost: ~$0.014 for 2 API call(s)]
# Batch with JSON output to file python3 scripts/fetch_transcript.py --batch urls.txt --json --output all_transcripts.json ```
### 输出格式
**文本(默认):** ``` Hello and welcome to this video. Today we're going to talk about... ```
**JSON (--json):** ```json { "video_id": "dQw4w9WgXcQ", "title": "Video Title", "transcript": [ {"start": 0.0, "duration": 2.5, "text": "Hello and welcome"}, {"start": 2.5, "duration": 3.0, "text": "to this video"} ], "full_text": "Hello and welcome to this video..." } ```
## 错误处理
该脚本处理常见错误: - 无效的 YouTube URL - 视频没有字幕 - API 配额超出 - 网络错误
## 元数据
```yaml metadata: clawdbot: emoji: "📹" requires: env: ["APIFY_API_TOKEN"] bins: ["python3"] ```