Introduction
# Ralph Loops Skill
> **First time?** Read [SETUP.md](./SETUP.md) first to install dependencies and verify your setup.
Autonomous AI agent loops for iterative development. Based on Geoffrey Huntley's Ralph Wiggum technique, as documented by Clayton Farr.
**Script:** `skills/ralph-loops/scripts/ralph-loop.mjs` **Dashboard:** `skills/ralph-loops/dashboard/` (run with `node server.mjs`) **Templates:** `skills/ralph-loops/templates/` **Archive:** `~/clawd/logs/ralph-archive/`
---
## ⚠️ Known Issues
### Claude Code Version Compatibility
**Claude Code 2.1.29 has a critical bug** that spawns orphaned sub-agents consuming 99% CPU. Iterations fail with "exit code null" on first run.
**Fix:** Downgrade to 2.1.25: ```bash npm install -g @anthropic-ai/[email protected] ```
**Verify:** ```bash claude --version # Should show 2.1.25 ```
This was discovered 2026-02-01. Check if newer versions fix the issue before upgrading.
---
## ⚠️ Don't Block the Conversation!
When running a Ralph loop, **don't monitor it synchronously**. The loop runs as a separate Claude CLI process — you can keep chatting.
**❌ Wrong (blocks conversation):** ``` Start loop → sleep 60 → poll → sleep 60 → poll → ... (6 minutes of silence) ```
**✅ Right (stays responsive):** ``` Start loop → "It's running, I'll check periodically" → keep chatting → check on heartbeats ```
**How to monitor without blocking:** 1. Start the loop with `node ralph-loop.mjs ...` (runs in background) 2. Tell human: "Loop running. I'll check progress periodically or you can ask." 3. Check via `process poll <sessionId>` when asked or during heartbeats 4. Use the dashboard at http://localhost:3939 for real-time visibility
**The loop is autonomous** — that's the whole point. Don't babysit it at the cost of ignoring your human.
---
## Trigger Phrases
When human says:
| Phrase | Action | |--------|--------| | **"Interview me about system X"** | Start Phase 1 requirements interview | | **"Start planning system X"** | Run `./loop.sh plan` (needs specs first) | | **"Start building system X"** | Run `./loop.sh build` (needs plan first) | | **"Ralph loop over X"** | **ASK which phase** (see below) |
### When Human Says "Ralph Loop" — Clarify the Phase!
Don't assume which phase. Ask:
> "Which type of Ralph loop are we doing? > > 1️⃣ **Interview** — I'll ask you questions to build specs (Phase 1) > 2️⃣ **Planning** — I'll iterate on an implementation plan (Phase 2) > 3️⃣ **Building** — I'll implement from a plan, one task per iteration (Phase 3) > 4️⃣ **Generic** — Simple iterative refinement on a single topic"
**Then proceed based on their answer:**
| Choice | Action | |--------|--------| | Interview | Use `templates/requirements-interview.md` protocol | | Planning | Need specs first → run planning loop with `PROMPT_plan.md` | | Building | Need plan first → run build loop with `PROMPT_build.md` | | Generic | Create prompt file, run `ralph-loop.mjs` directly |
### Generic Ralph Loop Flow (Phase 4)
For simple iterative refinement (not full system builds):
1. **Clarify the task** — What exactly should be improved/refined? 2. **Create a prompt file** — Save to `/tmp/ralph-prompt-<task>.md` 3. **Set completion criteria** — What signals "done"? 4. **Run the loop:** ```bash node skills/ralph-loops/scripts/ralph-loop.mjs \ --prompt "/tmp/ralph-prompt-<task>.md" \ --model opus \ --max 10 \ --done "RALPH_DONE" ``` 5. **Or spawn as sub-agent** for long-running tasks
---
## Core Philosophy
> "Human roles shift from 'telling the agent what to do' to 'engineering conditions where good outcomes emerge naturally through iteration." > — Clayton Farr
Three principles drive everything:
1. **Context is scarce** — With ~176K usable tokens from a 200K window, keep each iteration lean 2. **Plans are disposable** — A drifting plan is cheaper to regenerate than salvage 3. **Backpressure beats direction** — Engineer environments where wrong outputs get rejected automatically
---
## Three-Phase Workflow
``` ┌─────────────────────────────────────────────────────────────────────┐ │ Phase 1: REQUIREMENTS │ │ Human + LLM conversation → JTBD → Topics → specs/*.md │ ├─────────────────────────────────────────────────────────────────────┤ │ Phase 2: PLANNING │ │ Gap analysis (specs vs code) → IMPLEMENTATION_PLAN.md │ ├─────────────────────────────────────────────────────────────────────┤ │ Phase 3: BUILDING │ │ One task per iteration → fresh context → backpressure → commit │ └─────────────────────────────────────────────────────────────────────┘ ```
### Phase 1: Requirements (Talk to Human)
**Goal:** Understand what to build BEFORE building it.
This is the most important phase. Use structured conversation to:
1. **Identify Jobs to Be Done (JTBD)** - What user need or outcome are we solving? - Not features — outcomes
2. **Break JTBD into Topics of Concern** - Each topic = one distinct aspect/component - Use the "one sentence without 'and'" test - ✓ "The color extraction system analyzes images to identify dominant colors" - ✗ "The user system handles authentication, profiles, and billing" → 3 topics
3. **Create Specs for Each Topic** - One markdown file per topic in `specs/` - Capture requirements, acceptance criteria, edge cases
**Template:** `templates/requirements-interview.md`
### Phase 2: Planning (Gap Analysis)
**Goal:** Create a prioritized task list without implementing anything.
Uses `PROMPT_plan.md` in the loop: - Study all specs - Study existing codebase - Compare specs vs code (gap analysis) - Generate `IMPLEMENTATION_PLAN.md` with prioritized tasks - **NO implementation** — planning only
Usually completes in 1-2 iterations.
### Phase 3: Building (One Task Per Iteration)
**Goal:** Implement tasks one at a time with fresh context.
Uses `PROMPT_build.md` in the loop: 1. Read `IMPLEMENTATION_PLAN.md` 2. Pick the most important task 3. Investigate codebase (don't assume not implemented) 4. Implement 5. Run validation (backpressure) 6. Update plan, commit 7. Exit → fresh context → next iteration
**Key insight:** One task per iteration keeps context lean. The agent stays in the "smart zone" instead of accumulating cruft.
**Why fresh context matters:** - **No accumulated mistakes** — Each iteration starts clean; previous errors don't compound - **Full context budget** — 200K tokens for THIS task, not shared with finished work - **Reduced hallucination** — Shorter contexts = more grounded responses - **Natural checkpoints** — Each commit is a save point; easy to revert single iterations
---
## File Structure
``` project/ ├── loop.sh # Ralph loop script ├── PROMPT_plan.md # Planning mode instructions ├── PROMPT_build.md # Building mode instructions ├── AGENTS.md # Operational guide (~60 lines max) ├── IMPLEMENTATION_PLAN.md # Prioritized task list (generated) └── specs/ # Requirement specs ├── topic-a.md ├── topic-b.md └── ... ```
### File Purposes
| File | Purpose | Who Creates | |------|---------|-------------| | `specs/*.md` | Source of truth for requirements | Human + Phase 1 | | `PROMPT_plan.md` | Instructions for planning mode | Copy from template | | `PROMPT_build.md` | Instructions for building mode | Copy from template | | `AGENTS.md` | Build/test/lint commands | Human + Ralph | | `IMPLEMENTATION_PLAN.md` | Task list with priorities | Ralph (Phase 2) |
### Project Organization (Systems)
For Clawdbot systems, each Ralph project lives in `<workspace>/systems/<name>/`:
``` systems/ ├── health-tracker/ # Example system │ ├── specs/ │ │ ├── daily-tracking.md │ │ └── test-scheduling.md │ ├── PROMPT_plan.md │ ├── PROMPT_build.md │ ├── AGENTS.md │ ├── IMPLEMENTATION_PLAN.md # ← exists = past Phase 1 │ └── src/ └── activity-planner/ ├── specs/ # ← empty = still in Phase 1 └── ... ```
### Phase Detection (Auto)
Detect current phase by checking what files exist:
| What Exists | Current Phase | Next Action | |-------------|---------------|-------------| | Nothing / empty `specs/` | Phase 1: Requirements | Run requirements interview | | `specs/*.md` but no `IMPLEMENTATION_PLAN.md` | Ready for Phase 2 | Run `./loop.sh plan` | | `specs/*.md` + `IMPLEMENTATION_PLAN.md` | Phase 2 or 3 | Review plan, run `./loop.sh build` | | Plan shows all tasks complete | Done | Archive or iterate |
**Quick check:** ```bash # What phase are we in? [ -z "$(ls specs/ 2>/dev/null)" ] && echo "Phase 1: Need specs" && exit [ ! -f IMPLEMENTATION_PLAN.md ] && echo "Phase 2: Need plan" && exit echo "Phase 3: Ready to build (or done)" ```
---
## JTBD Breakdown
The hierarchy matters:
``` JTBD (Job to Be Done) └── Topic of Concern (1 per spec file) └── Tasks (many per topic, in IMPLEMENTATION_PLAN.md) ```
**Example:** - **JTBD:** "Help designers create mood boards" - **Topics:** - Image collection → `specs/image-collection.md` - Color extraction → `specs/color-extraction.md` - Layout system → `specs/layout-system.md` - Sharing → `specs/sharing.md` - **Tasks:** Each spec generates multiple implementation tasks
### Topic Scope Test
> Can you describe the topic in one sentence without "and"?
If you need "and" or "also", it's probably multiple topics. Split it.
**When to split:** - Multiple verbs in the description → separate topics - Different user personas involved → separate topics - Could be implemented by different teams → separate topics - Has its own failure modes → probably its own topic
**Example split:** ``` ❌ "User management handles registration, authentication, profiles, and permissions"
✅ Split into: - "Registration creates new user accounts from email/password" - "Authentication verifies user identity via login flow" - "Profiles let users view and edit their information" - "Permissions control what actions users can perform" ```
**Counter-example (don't split):** ``` ✅ Keep together: "Color extraction analyzes images and returns dominant color palettes" Why: "analyzes" and "returns" are steps in one operation, not separate concerns. ```
---
## Backpressure Mechanisms
Autonomous loops converge when wrong outputs get rejected. Three layers:
### 1. Downstream Gates (Hard) Tests, type-checking, linting, build validation. Deterministic. ```markdown # In AGENTS.md ## Validation - Tests: `npm test` - Typecheck: `npm run typecheck` - Lint: `npm run lint` ```
### 2. Upstream Steering (Soft) Existing code patterns guide the agent. It discovers conventions through exploration.
### 3. LLM-as-Judge (Subjective) For subjective criteria (tone, UX, aesthetics), use another LLM call with binary pass/fail.
> Start with hard gates. Add LLM-as-judge for subjective criteria only after mechanical backpressure works.
---
## Prompt Structure
Geoffrey's prompts follow a numbered pattern:
| Section | Purpose | |---------|---------| | 0a-0d | **Orient:** Study specs, source, current plan | | 1-4 | **Main instructions:** What to do this iteration | | 999+ | **Guardrails:** Invariants (higher number = more critical) |
### The Numbered Guardrails Pattern
Guardrails use escalating numbers (99999, 999999, 9999999...) to signal priority:
```markdown 99999. Important: Capture the why in documentation.
999999. Important: Single sources of truth, no migrations.
9999999. Create git tags after successful builds.
99999999. Add logging if needed to debug.
999999999. Keep IMPLEMENTATION_PLAN.md current. ```
**Why this works:** 1. **Visual prominence** — Large numbers stand out, harder to skip 2. **Implicit priority** — More 9s = more critical (like DEFCON levels in reverse) 3. **No collisions** — Sparse numbering lets you insert new rules without renumbering 4. **Mnemonic** — Claude treats these as invariants, not suggestions
**The "Important:" prefix** is deliberate — it triggers Claude's attention.
### Key Language Patterns
Use Geoffrey's specific phrasing — it matters:
- "study" (not "read" or "look at") - "don't assume not implemented" (critical!) - "using parallel subagents" / "up to N subagents" - "only 1 subagent for build/tests" (backpressure control) - "Ultrathink" (deep reasoning trigger) - "capture the why" - "keep it up to date" - "resolve them or document them"
---
## Quick Start
### 1. Set Up Project Structure
```bash mkdir -p myproject/specs cd myproject git init # Ralph expects git for commits
# Copy templates cp .//templates/PROMPT_plan.md . cp .//templates/PROMPT_build.md . cp .//templates/AGENTS.md . cp .//templates/loop.sh . chmod +x loop.sh ```
### 2. Customize Templates (Required!)
**PROMPT_plan.md** — Replace `[PROJECT_GOAL]` with your actual goal: ```markdown # Before: ULTIMATE GOAL: We want to achieve [PROJECT_GOAL].
# After: ULTIMATE GOAL: We want to achieve a fully functional mood board app with image upload and color extraction. ```
**PROMPT_build.md** — Adjust source paths if not using `src/`: ```markdown # Before: 0c. For reference, the application source code is in `src/*`.
# After: 0c. For reference, the application source code is in `lib/*`. ```
**AGENTS.md** — Update build/test/lint commands for your stack.
### 3. Phase 1: Requirements Gathering (Don't Skip!)
This phase happens WITH the human. Use the interview template:
```bash cat .//templates/requirements-interview.md ```
**The workflow:** 1. Discuss the JTBD (Job to Be Done) — outcomes, not features 2. Break into Topics of Concern (each passes the "one sentence" test) 3. Write a spec file for each topic: `specs/topic-name.md` 4. Human reviews and approves specs
**Example output:** ``` specs/ ├── image-collection.md ├── color-extraction.md ├── layout-system.md └── sharing.md ```
### 4. Phase 2: Planning
```bash ./loop.sh plan ```
Wait for `IMPLEMENTATION_PLAN.md` to be generated (usually 1-2 iterations). Review it — this is your task list.
### 5. Phase 3: Building
```bash ./loop.sh build 20 # Max 20 iterations ```
Watch it work. Add backpressure (tests, lints) as patterns emerge. Check commits for progress.
---
## Loop Script Options
```bash ./loop.sh # Build mode, unlimited ./loop.sh 20 # Build mode, max 20 iterations ./loop.sh plan # Plan mode, unlimited ./loop.sh plan 5 # Plan mode, max 5 iterations ```
Or use the Node.js wrapper for more control:
```bash node skills/ralph-loops/scripts/ralph-loop.mjs \ --prompt "./PROMPT_build.md" \ --model opus \ --max 20 \ --done "RALPH_DONE" ```
---
## When to Regenerate the Plan
Plans drift. Regenerate when:
- Ralph is going off track (implementing wrong things) - Plan feels stale or doesn't match current state - Too much clutter from completed items - You've made significant spec changes - You're confused about what's actually done
Just switch back to planning mode:
```bash ./loop.sh plan ```
Regeneration cost is one Planning loop. Cheap compared to Ralph going in circles.
---
## Safety
Ralph requires `--dangerously-skip-permissions` to run autonomously. This bypasses Claude's permission system entirely.
**Philosophy:** "It's not if it gets popped, it's when. And what is the blast radius?"
**Protections:** - Run in isolated environments (Docker, VM) - Only the API keys needed for the task - No access to private data beyond requirements - Restrict network connectivity where possible - **Escape hatches:** Ctrl+C stops the loop; `git reset --hard` reverts uncommitted changes
---
## Cost Expectations
| Task Type | Model | Iterations | Est. Cost | |-----------|-------|------------|-----------| | Generate plan | Opus | 1-2 | $0.50-1.00 | | Implement simple feature | Opus | 3-5 | $1.00-2.00 | | Implement complex feature | Opus | 10-20 | $3.00-8.00 | | Full project buildout | Opus | 50+ | $15-50+ |
**Tip:** Use Sonnet for simpler tasks where plan is clear. Use Opus for planning and complex reasoning.
---
## Real-World Results
From Geoffrey Huntley: - 6 repos generated overnight at YC hackathon - $50k contract completed for $297 in API costs - Created entire programming language over 3 months
---
## Advanced: Running as Sub-Agent
For long loops, spawn as sub-agent so main session stays responsive:
```javascript sessions_spawn({ task: `cd /path/to/project && ./loop.sh build 20 Summarize what was implemented when done.`, label: "ralph-build", model: "opus" }) ```
Check progress: ```javascript sessions_list({ kinds: ["spawn"] }) sessions_history({ label: "ralph-build", limit: 5 }) ```
---
## Troubleshooting
### Ralph keeps implementing the same thing - Plan is stale → regenerate with `./loop.sh plan` - Backpressure missing → add tests that catch duplicates
### Ralph goes in circles - Add more specific guardrails to prompts - Check if specs are ambiguous - Regenerate plan
### Context getting bloated - Ensure one task per iteration (check prompt) - Keep AGENTS.md under 60 lines - Move status/progress to IMPLEMENTATION_PLAN.md, not AGENTS.md
### Tests not running - Check AGENTS.md has correct validation commands - Ensure backpressure section in prompt references AGENTS.md
---
## Edge Cases
### Projects Without Git
The loop script expects git for commits and pushes. For projects without version control:
**Option 1: Initialize git anyway** (recommended) ```bash git init git add -A git commit -m "Initial commit before Ralph" ```
**Option 2: Modify the prompts** - Remove git-related guardrails from PROMPT_build.md - Remove the git push section from loop.sh - Use file backups instead: add `cp -r src/ backups/iteration-$ITERATION/` to loop.sh
**Option 3: Use tarball snapshots** ```bash # Add to loop.sh before each iteration: tar -czf "snapshots/pre-iteration-$ITERATION.tar.gz" src/ ```
### Very Large Codebases
For codebases with 100K+ lines:
- **Reduce subagent parallelism:** Change "up to 500 parallel Sonnet subagents" to "up to 50" in prompts - **Scope narrowly:** Use focused specs that target specific directories - **Add path restrictions:** In AGENTS.md, note which directories are in-scope - **Consider workspace splitting:** Treat large modules as separate Ralph projects
### When Claude CLI Isn't Available
The methodology works with any Claude interface:
**Claude API directly:** ```bash # Replace loop.sh with API calls using curl or a script curl https://api.anthropic.com/v1/messages \ -H "x-api-key: $ANTHROPIC_API_KEY" \ -H "content-type: application/json" \ -d '{"model": "claude-sonnet-4-20250514", "max_tokens": 8192, "messages": [...]}' ```
**Alternative agents:** - **Aider:** `aider --opus --auto-commits` - **Continue.dev:** Use with Claude API key - **Cursor:** Composer mode with PROMPT files as context
The key principles (one task per iteration, fresh context, backpressure) apply regardless of tooling.
### Non-Node.js Projects
Adapt AGENTS.md for your stack:
| Stack | Build | Test | Lint | |-------|-------|------|------| | Python | `pip install -e .` | `pytest` | `ruff .` | | Go | `go build ./...` | `go test ./...` | `golangci-lint run` | | Rust | `cargo build` | `cargo test` | `cargo clippy` | | Ruby | `bundle install` | `rspec` | `rubocop` |
Also update path references in prompts (`src/*` → your source directory).
---
## Learn More
- Geoffrey Huntley: https://ghuntley.com/ralph/ - Clayton Farr's Playbook: https://github.com/ClaytonFarr/ralph-playbook - Geoffrey's Fork: https://github.com/ghuntley/how-to-ralph-wiggum
---
## Credits
Built by **Johnathan & Q** — a human-AI dyad.
- Twitter: [@spacepixel](https://x.com/spacepixel) - ClawdHub: [clawhub.ai/skills/ralph-loops](https://www.clawhub.ai/skills/ralph-loops)