Introduction
# YT-to-Blog Content Engine
YouTube URL → blog post + Substack + tweets + vertical video clips. The whole content machine.
## Pipeline Overview
``` YouTube URL ↓ ① Transcript (summarize CLI) ↓ ② Blog Draft (AI-written in your voice) ↓ ③ Substack Publish (browser automation) ↓ ④ X/Twitter Post (bird CLI) ↓ ④b Facebook Group (optional reminder) ↓ ⑤ Script Splitter (extract hook moments) ↓ ⑥ HeyGen Videos (AI avatar vertical clips) ↓ ⑦ Post-Processing (ffmpeg crop/scale) ↓ 📁 Output Folder (blog.md, videos, tweet.txt, URLs) ```
**One URL in → Five platforms out.** Run the whole thing or any step individually.
---
## First-Time Setup Wizard
Walk the user through this on first use. It takes ~10 minutes once, then never again.
### Step 1: Check Dependencies
Run the setup script to check what's installed:
```bash bash skills/yt-content-engine/setup.sh ```
Required CLIs: | Tool | Purpose | Install | |------|---------|---------| | `summarize` | YouTube transcript extraction | `brew install steipete/tap/summarize` | | `bird` | X/Twitter posting | `brew install steipete/tap/bird` | | `ffmpeg` | Video post-processing | `brew install ffmpeg` | | `curl` | API calls to HeyGen | Usually pre-installed on macOS | | `python3` | Helper scripts | Usually pre-installed on macOS |
If anything is missing, tell the user what to install and wait for confirmation.
### Step 2: HeyGen API Key
1. Tell the user: "Go to https://app.heygen.com/settings — grab your API key from the API section." 2. If they don't have a HeyGen account: "Sign up at https://heygen.com — the free tier gives you a few credits to test with." 3. Save the key to `config.json` (see config schema below). 4. Test it:
```bash curl -s -H "X-Api-Key: API_KEY_HERE" https://api.heygen.com/v2/avatars | python3 -c "import sys,json; d=json.load(sys.stdin); print('✅ API key works!' if 'data' in d else '❌ Invalid key')" ```
### Step 3: HeyGen Avatar Setup
Tell the user:
> "For vertical video clips, you need a HeyGen avatar. Here's what matters: > > **Record in PORTRAIT mode** (hold your phone vertically). This is critical — if you record landscape, the avatar will be a small strip in the center of a 9:16 frame and we'll need to crop/scale it (which works but loses quality). > > Go to https://app.heygen.com/avatars → Create Instant Avatar → follow their recording guide. Stand in good lighting, look at camera, speak naturally for 2+ minutes. > > Once created, grab your Avatar ID from the avatar details page."
List their existing avatars to help them pick. Note: the avatars endpoint returns both custom and stock avatars — filter for the user's custom ones (they typically appear first and have personal names):
```bash curl -s -H "X-Api-Key: API_KEY" https://api.heygen.com/v2/avatars | python3 -c " import sys, json data = json.load(sys.stdin) for a in data.get('data', {}).get('avatars', []): print(f\" {a['avatar_id']} — {a.get('avatar_name', 'unnamed')}\") " ```
### Step 4: HeyGen Voice Clone
Tell the user:
> "Go to https://app.heygen.com/voice-clone → Clone your voice. Upload a clean audio sample (1-2 min of you speaking naturally). HeyGen will create a voice ID. > > Once done, grab your Voice ID from the voice settings."
List their voices. User's cloned voices typically appear first; stock voices come after:
```bash curl -s -H "X-Api-Key: API_KEY" https://api.heygen.com/v2/voices | python3 -c " import sys, json data = json.load(sys.stdin) for v in data.get('data', {}).get('voices', []): print(f\" {v['voice_id']} — {v.get('name', 'unnamed')} ({v.get('language', '?')})\") " ```
⚠️ **IMPORTANT:** Use the FULL voice_id (e.g., `69da9c9bca78499b98fdac698d2a20cd`), not a truncated version. The API will return "Voice validation failed" if you use a shortened ID.
### Step 5: Substack Login
Substack has no API — posting requires browser automation.
1. Open the OpenClaw managed browser: use browser tool with `profile="openclaw"` 2. Navigate to `https://substack.com/sign-in` 3. Help the user log in with their credentials 4. Verify access by navigating to their publication dashboard 5. Save the publication URL to `config.json`
The browser session persists across restarts. One-time setup.
### Step 6: Save Config
Create `skills/yt-content-engine/config.json` (relative to your workspace):
```json { "heygen": { "apiKey": "YOUR_API_KEY", "avatarId": "YOUR_AVATAR_ID", "voiceId": "YOUR_VOICE_ID" }, "substack": { "publication": "yourblog.substack.com" }, "twitter": { "handle": "@yourhandle" }, "author": { "voice": "Description of your writing voice and style", "name": "Your Name" }, "video": { "clipCount": 5, "maxClipSeconds": 60, "cropMode": "auto" } } ```
**Tip:** If the user already has a voice guide from the `yt-to-blog` skill, read it from `skills/yt-to-blog/references/voice-guide.md` and use it for the `author.voice` field.
### Step 7: Verify Everything
Run the setup script with the config in place:
```bash bash skills/yt-content-engine/setup.sh ```
It will test each component and report status.
---
## How to Invoke
### Full Pipeline ``` "Turn this into a full content suite: https://youtu.be/XXXXX" "Content engine this video: [URL]" "Run the full pipeline on [URL]" ```
### Individual Steps ``` "Just get me the transcript from [URL]" "Write a blog post from [URL]" (steps 1-2) "Post this to Substack" (step 3, after blog exists) "Tweet about this blog post" (step 4) "Generate video clips from this blog" (steps 5-7) "Just split this into scripts" (step 5 only) ```
---
## Pipeline Steps
### Step ①: Transcript
Create the output directory for this run, then fetch the YouTube transcript:
```bash mkdir -p /tmp/yt-content-engine/output-$(date +%Y-%m-%d)/scripts mkdir -p /tmp/yt-content-engine/output-$(date +%Y-%m-%d)/videos ```
```bash summarize "YOUTUBE_URL" --extract > /tmp/yt-content-engine/transcript.txt ```
The `--extract` flag prints the raw transcript without LLM summarization. Read the output. If it fails (no captions available), try with `--youtube yt-dlp` for auto-generated captions, or tell the user and suggest they provide a manual transcript.
### Step ②: Blog Draft
Transform the transcript into a polished long-form blog post.
**Load the author voice** from `config.json` → `author.voice`. If a more detailed voice guide exists at `skills/yt-to-blog/references/voice-guide.md`, read and use that too.
**Analysis phase** — before writing, extract from the transcript: - Core thesis — the single strongest argument or revelation - Key data points — statistics, quotes, dates, names - Narrative moments — anecdotes, examples, scenes - Source links — URLs, studies, references mentioned - Missing context — what does the reader need that the video assumed?
**Writing structure:** 1. **Cold open (1-3 paragraphs):** Scene-setting. Specific, sensory, emotional hook before data. 2. **Thesis pivot (1 paragraph):** Connect scene to the bigger story. 3. **Data body (5-15 paragraphs):** Alternate data and editorial. Each stat gets a punch line. Subheadings for major breaks only. 4. **Callback (1-2 paragraphs):** Return to opening scene/metaphor. 5. **Closing (3-6 short paragraphs):** Escalating fragments. Final hammer line.
**Writing rules:** - Vary sentence length dramatically — long data sentences, then short punches - Em dashes for asides, not parentheses - Sentence fragments for emphasis - No bullet lists in the body — narrative flow - Inline source links, no footnotes - No "in conclusion" or "to summarize" - Credit video source naturally: "As [Name] put it..." with link - Target: 1,500-3,000 words
**Generate 3-5 headline options** with distinct strategies (contrast/irony, revelation, moral framing, callback). Each with a subtitle. Let the user pick.
Save the final draft to the output folder as `blog.md`.
### Step ③: Substack Publish
Post the blog to Substack via browser automation.
1. Read `config.json` → `substack.publication` 2. Open managed browser (`profile="openclaw"`) 3. Navigate to `https://PUBLICATION.substack.com/publish/post` 4. Click the title field, type the title 5. Click the subtitle area, type the subtitle 6. Click the body area 7. Write markdown to a temp file, copy to clipboard (`pbcopy < /tmp/post.md`), paste into editor (Meta+v) 8. Substack auto-saves as draft
**Known issues:** - Em dashes (`—`) may garble as `,Äî` during clipboard paste → find/replace after paste - Large posts: pause briefly between paste and verification - Verify draft at `https://PUBLICATION.substack.com/publish`
**Default: save as draft.** Only publish if the user explicitly says "publish it" — always confirm first.
Save the Substack URL to `output/substack-url.txt`.
### Step ④: X/Twitter Post
Compose and post using the `bird` CLI.
**Compose the tweet/thread:** - If the blog has a single killer hook → single tweet with link - If there are multiple strong points → thread (3-5 tweets) - Include the Substack URL - Match the author's voice but punchier — tweets are hooks, not summaries - Use the handle from `config.json` → `twitter.handle`
**Post with bird:** ```bash # Single tweet bird tweet "Your tweet text here"
# Thread (post first tweet, then reply to it) bird tweet "Tweet 1 text here" # Note the returned tweet ID, then: bird reply TWEET_ID "Tweet 2 text here" # And chain: bird reply TWEET_2_ID "Tweet 3 text here" ```
**Always show the user the tweet text before posting and get confirmation.**
Save tweet text to `output/tweet.txt`.
### Step ④b: Facebook Group (Optional)
If `config.json` includes a `facebook.group` URL, remind the user to post to their Facebook Group.
**Note:** Facebook Group API posting is heavily restricted. Browser automation is unreliable due to Facebook's anti-bot measures. Best approach:
1. Draft a Facebook post version of the content (shorter, more casual than tweet) 2. Save to `output/facebook-post.txt` 3. Remind the user: "Don't forget to post to [Group Name] — here's your draft" 4. User posts manually
This keeps Facebook distribution in the workflow without fighting their API restrictions.
### Step ⑤: Script Splitter
Extract 3-5 "hook moments" from the blog post and rewrite each as a spoken-word script for vertical video.
**What to look for** (scan the blog for these patterns): 1. **Hook/Controversy** — the most provocative claim, the thing that makes people stop scrolling 2. **Data Bomb** — a surprising statistic or fact that reframes understanding 3. **Counterintuitive Take** — something that contradicts conventional wisdom 4. **Emotional Moment** — a story, anecdote, or human element that creates connection 5. **Call-to-Action Closer** — a rallying cry, challenge, or "what you should do now"
Not every blog will have all five. Extract what's there. Minimum 3 clips.
**Rewrite rules for spoken delivery:** - **Hook first** — open with the most attention-grabbing line. No preamble. - **Conversational** — write for speaking, not reading. Contractions, natural rhythm. - **30-60 seconds each** — roughly 75-150 words per clip - **Self-contained** — each clip must work on its own, no "as I mentioned earlier" - **End with punch** — close on the strongest line, not a trailing thought - **No stage directions** — just the words to speak, nothing else
**Format each script:** ``` CLIP 1: [descriptive title] --- [Script text here, 75-150 words] ```
Use `config.json` → `video.clipCount` for the target number of clips (default: 5). Use `config.json` → `video.maxClipSeconds` for max duration (default: 60).
Save scripts to `output/scripts/clip-1.txt`, `clip-2.txt`, etc.
### Step ⑥: HeyGen Video Generation
Submit each script to HeyGen API v2 to generate AI avatar videos.
**Read config:** ```bash # Parse config.json API_KEY=$(python3 -c "import json; c=json.load(open('config.json')); print(c['heygen']['apiKey'])") AVATAR_ID=$(python3 -c "import json; c=json.load(open('config.json')); print(c['heygen']['avatarId'])") VOICE_ID=$(python3 -c "import json; c=json.load(open('config.json')); print(c['heygen']['voiceId'])") ```
**For each script, submit a video generation request:**
```bash curl -s -X POST "https://api.heygen.com/v2/video/generate" \ -H "X-Api-Key: $API_KEY" \ -H "Content-Type: application/json" \ -d '{ "video_inputs": [{ "character": { "type": "avatar", "avatar_id": "'"$AVATAR_ID"'", "avatar_style": "normal" }, "voice": { "type": "text", "input_text": "'"$(cat output/scripts/clip-1.txt)"'", "voice_id": "'"$VOICE_ID"'" } }], "dimension": { "width": 1080, "height": 1920 } }' ```
**Parse the response** to get `video_id`: ```python import json response = json.loads(response_text) video_id = response["data"]["video_id"] ```
**Submit ALL clips before polling.** HeyGen renders in parallel — submit all scripts first, collect all video_ids, then poll them all. This cuts total render time from N×3min to ~3min.
**Poll for completion** (every 15 seconds, timeout after 10 minutes):
```bash curl -s -H "X-Api-Key: $API_KEY" \ "https://api.heygen.com/v1/video_status.get?video_id=$VIDEO_ID" \ | python3 -c "import sys,json; d=json.load(sys.stdin)['data']; print(d['status'], d.get('video_url',''))" ```
Statuses: `pending` → `processing` → `completed` (with `video_url`) or `failed` (with `error`).
**Download completed videos:** ```bash curl -L -o "output/videos/clip-1-raw.mp4" "$VIDEO_URL" ```
**Credit note:** ~1 credit per 1 minute of video. A typical 5-clip run uses ~3 credits. Warn the user about credit usage before submitting.
### Step ⑦: Video Post-Processing
If the avatar was recorded in landscape (common), the 9:16 video will show a small avatar strip centered in a large frame with background fill. Fix this with ffmpeg.
**Check `config.json` → `video.cropMode`:** - `"auto"` — detect and crop automatically - `"portrait"` — skip cropping (avatar was recorded in portrait) - `"manual"` — ask user for crop coordinates
**Auto-crop pipeline:**
```bash # 1. Detect content bounds by scanning center column for non-background pixels # Extract a single frame ffmpeg -i input.mp4 -vframes 1 -y /tmp/frame.png
# 2. Use ffmpeg cropdetect to find content bounds ffmpeg -i input.mp4 -vf "cropdetect=24:16:0" -frames:v 30 -f null - 2>&1 | grep cropdetect
# Parse the crop values from output: crop=W:H:X:Y
# 3. Crop content strip, scale up, center-crop to 1080x1920 ffmpeg -i input.mp4 \ -vf "crop=DETECTED_W:DETECTED_H:DETECTED_X:DETECTED_Y,scale=1080:-1,crop=1080:1920:0:(ih-1920)/2" \ -c:a copy \ -y output.mp4 ```
**Alternative manual detection** (preferred — cropdetect often fails when background is white/light):
HeyGen typically renders landscape avatars centered on a white/light background in the 9:16 frame. Scan the center column for non-white pixels to find the actual content strip:
```bash # Extract a frame, then scan center column for content bounds ffmpeg -y -ss 5 -i input.mp4 -frames:v 1 /tmp/frame.png 2>/dev/null
ffmpeg -y -i /tmp/frame.png -vf "crop=1:ih:iw/2:0,format=gray" -f rawvideo -pix_fmt gray - 2>/dev/null | \ python3 -c " import sys data = sys.stdin.buffer.read() first = last = None for i, b in enumerate(data): if b < 240: # Non-white pixel = actual content if first is None: first = i last = i if first is not None: print(f'CONTENT_Y={first}') print(f'CONTENT_HEIGHT={last - first}') print(f'CENTER={( first + last) // 2}') else: print('No content bounds detected — avatar may already fill the frame') " ```
Then crop the content strip, scale proportionally to fill width, and center-crop to 9:16: ```bash ffmpeg -y -i input.mp4 \ -vf "crop=iw:CONTENT_HEIGHT:0:CONTENT_Y,scale=-1:1920,crop=1080:1920:(ow-1080)/2:0" \ -c:v libx264 -crf 23 -preset fast -c:a aac \ output.mp4 ```
**Proven crop values for common HeyGen landscape avatars** (1080x1920 canvas): - Content strip typically at y≈656, height≈607px - Example: `crop=1080:607:0:656,scale=3413:1920,crop=1080:1920:1166:0` - Always detect per-video — avatar placement can shift
**Save processed videos** to `output/videos/clip-1.mp4`, `clip-2.mp4`, etc.
If crop mode is `portrait`, just copy the raw files: ```bash cp output/videos/clip-1-raw.mp4 output/videos/clip-1.mp4 ```
### Step ⑧: Output
Organize everything in a dated output folder:
``` output-YYYY-MM-DD/ ├── blog.md # Final blog post ├── tweet.txt # Tweet text (posted or ready to post) ├── substack-url.txt # URL of Substack draft/post ├── scripts/ │ ├── clip-1.txt # Spoken word scripts │ ├── clip-2.txt │ └── ... ├── videos/ │ ├── clip-1.mp4 # Final processed vertical videos │ ├── clip-2.mp4 │ └── ... └── manifest.json # Run metadata ```
**manifest.json:** ```json { "source": "https://youtu.be/XXXXX", "date": "2026-02-03", "blog": "blog.md", "substackUrl": "https://...", "tweetUrl": "https://...", "clips": ["clip-1.mp4", "clip-2.mp4", "..."], "heygenCreditsUsed": 3 } ```
Report the summary to the user: - ✅ Blog post: X words - ✅ Substack: [URL] (draft/published) - ✅ Tweet: posted / ready to post - ✅ X video clips generated and processed - 💰 HeyGen credits used: ~X
---
## Config Reference
Config file: `skills/yt-content-engine/config.json` (relative to workspace root)
| Key | Description | Default | |-----|-------------|---------| | `heygen.apiKey` | HeyGen API key | Required | | `heygen.avatarId` | Your HeyGen avatar ID | Required | | `heygen.voiceId` | Your cloned voice ID | Required | | `substack.publication` | Substack subdomain | Required | | `twitter.handle` | X/Twitter handle | Required | | `author.voice` | Writing style description | Recommended | | `author.name` | Author name for attribution | Recommended | | `video.clipCount` | Number of clips to generate | `5` | | `video.maxClipSeconds` | Max seconds per clip | `60` | | `video.cropMode` | `auto`, `portrait`, or `manual` | `auto` |
---
## Tips & Troubleshooting
- **HeyGen rendering takes 2-3 min per clip.** Set expectations — a 5-clip run takes 10-15 minutes of render time. - **Portrait avatars save time.** No cropping needed. Worth re-recording if you use this regularly. - **Substack session expires?** Re-run the browser login step (Step 5 of setup). - **bird CLI not posting?** Run `bird auth` to re-authenticate. - **Bad crop detection?** Switch `cropMode` to `manual` and eyeball the content bounds from a frame export. - **HeyGen quota errors?** Check credits at https://app.heygen.com/settings — upgrade plan or reduce clip count. - **Transcript unavailable?** Some videos don't have captions. Try `summarize "URL" --extract --youtube yt-dlp` for auto-generated captions, or ask the user for a manual transcript.