Ralph Loops

Introduction

# Ralph Loops Skill

> **First time?** Read [SETUP.md](./SETUP.md) first to install dependencies and verify your setup.

Autonomous AI agent loops for iterative development. Based on Geoffrey Huntley's Ralph Wiggum technique, as documented by Clayton Farr.

**Script:** `skills/ralph-loops/scripts/ralph-loop.mjs` **Dashboard:** `skills/ralph-loops/dashboard/` (run with `node server.mjs`) **Templates:** `skills/ralph-loops/templates/` **Archive:** `~/clawd/logs/ralph-archive/`

---

## ⚠️ Known Issues

### Claude Code Version Compatibility

**Claude Code 2.1.29 has a critical bug** that spawns orphaned sub-agents consuming 99% CPU. Iterations fail with "exit code null" on first run.

**Fix:** Downgrade to 2.1.25: ```bash npm install -g @anthropic-ai/[email protected] ```

**Verify:** ```bash claude --version # Should show 2.1.25 ```

This was discovered 2026-02-01. Check if newer versions fix the issue before upgrading.

---

## ⚠️ Don't Block the Conversation!

When running a Ralph loop, **don't monitor it synchronously**. The loop runs as a separate Claude CLI process — you can keep chatting.

**❌ Wrong (blocks conversation):** ``` Start loop → sleep 60 → poll → sleep 60 → poll → ... (6 minutes of silence) ```

**✅ Right (stays responsive):** ``` Start loop → "It's running, I'll check periodically" → keep chatting → check on heartbeats ```

**How to monitor without blocking:** 1. Start the loop with `node ralph-loop.mjs ...` (runs in background) 2. Tell human: "Loop running. I'll check progress periodically or you can ask." 3. Check via `process poll <sessionId>` when asked or during heartbeats 4. Use the dashboard at http://localhost:3939 for real-time visibility

**The loop is autonomous** — that's the whole point. Don't babysit it at the cost of ignoring your human.

---

## Trigger Phrases

When human says:

| Phrase | Action | |--------|--------| | **"Interview me about system X"** | Start Phase 1 requirements interview | | **"Start planning system X"** | Run `./loop.sh plan` (needs specs first) | | **"Start building system X"** | Run `./loop.sh build` (needs plan first) | | **"Ralph loop over X"** | **ASK which phase** (see below) |

### When Human Says "Ralph Loop" — Clarify the Phase!

Don't assume which phase. Ask:

> "Which type of Ralph loop are we doing? > > 1️⃣ **Interview** — I'll ask you questions to build specs (Phase 1) > 2️⃣ **Planning** — I'll iterate on an implementation plan (Phase 2) > 3️⃣ **Building** — I'll implement from a plan, one task per iteration (Phase 3) > 4️⃣ **Generic** — Simple iterative refinement on a single topic"

**Then proceed based on their answer:**

| Choice | Action | |--------|--------| | Interview | Use `templates/requirements-interview.md` protocol | | Planning | Need specs first → run planning loop with `PROMPT_plan.md` | | Building | Need plan first → run build loop with `PROMPT_build.md` | | Generic | Create prompt file, run `ralph-loop.mjs` directly |

### Generic Ralph Loop Flow (Phase 4)

For simple iterative refinement (not full system builds):

1. **Clarify the task** — What exactly should be improved/refined? 2. **Create a prompt file** — Save to `/tmp/ralph-prompt-<task>.md` 3. **Set completion criteria** — What signals "done"? 4. **Run the loop:** ```bash node skills/ralph-loops/scripts/ralph-loop.mjs \ --prompt "/tmp/ralph-prompt-<task>.md" \ --model opus \ --max 10 \ --done "RALPH_DONE" ``` 5. **Or spawn as sub-agent** for long-running tasks

---

## Core Philosophy

> "Human roles shift from 'telling the agent what to do' to 'engineering conditions where good outcomes emerge naturally through iteration." > — Clayton Farr

Three principles drive everything:

1. **Context is scarce** — With ~176K usable tokens from a 200K window, keep each iteration lean 2. **Plans are disposable** — A drifting plan is cheaper to regenerate than salvage 3. **Backpressure beats direction** — Engineer environments where wrong outputs get rejected automatically

---

## Three-Phase Workflow

``` ┌─────────────────────────────────────────────────────────────────────┐ │ Phase 1: REQUIREMENTS │ │ Human + LLM conversation → JTBD → Topics → specs/*.md │ ├─────────────────────────────────────────────────────────────────────┤ │ Phase 2: PLANNING │ │ Gap analysis (specs vs code) → IMPLEMENTATION_PLAN.md │ ├─────────────────────────────────────────────────────────────────────┤ │ Phase 3: BUILDING │ │ One task per iteration → fresh context → backpressure → commit │ └─────────────────────────────────────────────────────────────────────┘ ```

### Phase 1: Requirements (Talk to Human)

**Goal:** Understand what to build BEFORE building it.

This is the most important phase. Use structured conversation to:

1. **Identify Jobs to Be Done (JTBD)** - What user need or outcome are we solving? - Not features — outcomes

2. **Break JTBD into Topics of Concern** - Each topic = one distinct aspect/component - Use the "one sentence without 'and'" test - ✓ "The color extraction system analyzes images to identify dominant colors" - ✗ "The user system handles authentication, profiles, and billing" → 3 topics

3. **Create Specs for Each Topic** - One markdown file per topic in `specs/` - Capture requirements, acceptance criteria, edge cases

**Template:** `templates/requirements-interview.md`

### Phase 2: Planning (Gap Analysis)

**Goal:** Create a prioritized task list without implementing anything.

Uses `PROMPT_plan.md` in the loop: - Study all specs - Study existing codebase - Compare specs vs code (gap analysis) - Generate `IMPLEMENTATION_PLAN.md` with prioritized tasks - **NO implementation** — planning only

Usually completes in 1-2 iterations.

### Phase 3: Building (One Task Per Iteration)

**Goal:** Implement tasks one at a time with fresh context.

Uses `PROMPT_build.md` in the loop: 1. Read `IMPLEMENTATION_PLAN.md` 2. Pick the most important task 3. Investigate codebase (don't assume not implemented) 4. Implement 5. Run validation (backpressure) 6. Update plan, commit 7. Exit → fresh context → next iteration

**Key insight:** One task per iteration keeps context lean. The agent stays in the "smart zone" instead of accumulating cruft.

**Why fresh context matters:** - **No accumulated mistakes** — Each iteration starts clean; previous errors don't compound - **Full context budget** — 200K tokens for THIS task, not shared with finished work - **Reduced hallucination** — Shorter contexts = more grounded responses - **Natural checkpoints** — Each commit is a save point; easy to revert single iterations

---

## File Structure

``` project/ ├── loop.sh # Ralph loop script ├── PROMPT_plan.md # Planning mode instructions ├── PROMPT_build.md # Building mode instructions ├── AGENTS.md # Operational guide (~60 lines max) ├── IMPLEMENTATION_PLAN.md # Prioritized task list (generated) └── specs/ # Requirement specs ├── topic-a.md ├── topic-b.md └── ... ```

### File Purposes

| File | Purpose | Who Creates | |------|---------|-------------| | `specs/*.md` | Source of truth for requirements | Human + Phase 1 | | `PROMPT_plan.md` | Instructions for planning mode | Copy from template | | `PROMPT_build.md` | Instructions for building mode | Copy from template | | `AGENTS.md` | Build/test/lint commands | Human + Ralph | | `IMPLEMENTATION_PLAN.md` | Task list with priorities | Ralph (Phase 2) |

### Project Organization (Systems)

For Clawdbot systems, each Ralph project lives in `<workspace>/systems/<name>/`:

``` systems/ ├── health-tracker/ # Example system │ ├── specs/ │ │ ├── daily-tracking.md │ │ └── test-scheduling.md │ ├── PROMPT_plan.md │ ├── PROMPT_build.md │ ├── AGENTS.md │ ├── IMPLEMENTATION_PLAN.md # ← exists = past Phase 1 │ └── src/ └── activity-planner/ ├── specs/ # ← empty = still in Phase 1 └── ... ```

### Phase Detection (Auto)

Detect current phase by checking what files exist:

| What Exists | Current Phase | Next Action | |-------------|---------------|-------------| | Nothing / empty `specs/` | Phase 1: Requirements | Run requirements interview | | `specs/*.md` but no `IMPLEMENTATION_PLAN.md` | Ready for Phase 2 | Run `./loop.sh plan` | | `specs/*.md` + `IMPLEMENTATION_PLAN.md` | Phase 2 or 3 | Review plan, run `./loop.sh build` | | Plan shows all tasks complete | Done | Archive or iterate |

**Quick check:** ```bash # What phase are we in? [ -z "$(ls specs/ 2>/dev/null)" ] && echo "Phase 1: Need specs" && exit [ ! -f IMPLEMENTATION_PLAN.md ] && echo "Phase 2: Need plan" && exit echo "Phase 3: Ready to build (or done)" ```

---

## JTBD Breakdown

The hierarchy matters:

``` JTBD (Job to Be Done) └── Topic of Concern (1 per spec file) └── Tasks (many per topic, in IMPLEMENTATION_PLAN.md) ```

**Example:** - **JTBD:** "Help designers create mood boards" - **Topics:** - Image collection → `specs/image-collection.md` - Color extraction → `specs/color-extraction.md` - Layout system → `specs/layout-system.md` - Sharing → `specs/sharing.md` - **Tasks:** Each spec generates multiple implementation tasks

### Topic Scope Test

> Can you describe the topic in one sentence without "and"?

If you need "and" or "also", it's probably multiple topics. Split it.

**When to split:** - Multiple verbs in the description → separate topics - Different user personas involved → separate topics - Could be implemented by different teams → separate topics - Has its own failure modes → probably its own topic

**Example split:** ``` ❌ "User management handles registration, authentication, profiles, and permissions"

✅ Split into: - "Registration creates new user accounts from email/password" - "Authentication verifies user identity via login flow" - "Profiles let users view and edit their information" - "Permissions control what actions users can perform" ```

**Counter-example (don't split):** ``` ✅ Keep together: "Color extraction analyzes images and returns dominant color palettes" Why: "analyzes" and "returns" are steps in one operation, not separate concerns. ```

---

## Backpressure Mechanisms

Autonomous loops converge when wrong outputs get rejected. Three layers:

### 1. Downstream Gates (Hard) Tests, type-checking, linting, build validation. Deterministic. ```markdown # In AGENTS.md ## Validation - Tests: `npm test` - Typecheck: `npm run typecheck` - Lint: `npm run lint` ```

### 2. Upstream Steering (Soft) Existing code patterns guide the agent. It discovers conventions through exploration.

### 3. LLM-as-Judge (Subjective) For subjective criteria (tone, UX, aesthetics), use another LLM call with binary pass/fail.

> Start with hard gates. Add LLM-as-judge for subjective criteria only after mechanical backpressure works.

---

## Prompt Structure

Geoffrey's prompts follow a numbered pattern:

| Section | Purpose | |---------|---------| | 0a-0d | **Orient:** Study specs, source, current plan | | 1-4 | **Main instructions:** What to do this iteration | | 999+ | **Guardrails:** Invariants (higher number = more critical) |

### The Numbered Guardrails Pattern

Guardrails use escalating numbers (99999, 999999, 9999999...) to signal priority:

```markdown 99999. Important: Capture the why in documentation.

999999. Important: Single sources of truth, no migrations.

9999999. Create git tags after successful builds.

99999999. Add logging if needed to debug.

999999999. Keep IMPLEMENTATION_PLAN.md current. ```

**Why this works:** 1. **Visual prominence** — Large numbers stand out, harder to skip 2. **Implicit priority** — More 9s = more critical (like DEFCON levels in reverse) 3. **No collisions** — Sparse numbering lets you insert new rules without renumbering 4. **Mnemonic** — Claude treats these as invariants, not suggestions

**The "Important:" prefix** is deliberate — it triggers Claude's attention.

### Key Language Patterns

Use Geoffrey's specific phrasing — it matters:

- "study" (not "read" or "look at") - "don't assume not implemented" (critical!) - "using parallel subagents" / "up to N subagents" - "only 1 subagent for build/tests" (backpressure control) - "Ultrathink" (deep reasoning trigger) - "capture the why" - "keep it up to date" - "resolve them or document them"

---

## Quick Start

### 1. Set Up Project Structure

```bash mkdir -p myproject/specs cd myproject git init # Ralph expects git for commits

# Copy templates cp .//templates/PROMPT_plan.md . cp .//templates/PROMPT_build.md . cp .//templates/AGENTS.md . cp .//templates/loop.sh . chmod +x loop.sh ```

### 2. Customize Templates (Required!)

**PROMPT_plan.md** — Replace `[PROJECT_GOAL]` with your actual goal: ```markdown # Before: ULTIMATE GOAL: We want to achieve [PROJECT_GOAL].

# After: ULTIMATE GOAL: We want to achieve a fully functional mood board app with image upload and color extraction. ```

**PROMPT_build.md** — Adjust source paths if not using `src/`: ```markdown # Before: 0c. For reference, the application source code is in `src/*`.

# After: 0c. For reference, the application source code is in `lib/*`. ```

**AGENTS.md** — Update build/test/lint commands for your stack.

### 3. Phase 1: Requirements Gathering (Don't Skip!)

This phase happens WITH the human. Use the interview template:

```bash cat .//templates/requirements-interview.md ```

**The workflow:** 1. Discuss the JTBD (Job to Be Done) — outcomes, not features 2. Break into Topics of Concern (each passes the "one sentence" test) 3. Write a spec file for each topic: `specs/topic-name.md` 4. Human reviews and approves specs

**Example output:** ``` specs/ ├── image-collection.md ├── color-extraction.md ├── layout-system.md └── sharing.md ```

### 4. Phase 2: Planning

```bash ./loop.sh plan ```

Wait for `IMPLEMENTATION_PLAN.md` to be generated (usually 1-2 iterations). Review it — this is your task list.

### 5. Phase 3: Building

```bash ./loop.sh build 20 # Max 20 iterations ```

Watch it work. Add backpressure (tests, lints) as patterns emerge. Check commits for progress.

---

## Loop Script Options

```bash ./loop.sh # Build mode, unlimited ./loop.sh 20 # Build mode, max 20 iterations ./loop.sh plan # Plan mode, unlimited ./loop.sh plan 5 # Plan mode, max 5 iterations ```

Or use the Node.js wrapper for more control:

```bash node skills/ralph-loops/scripts/ralph-loop.mjs \ --prompt "./PROMPT_build.md" \ --model opus \ --max 20 \ --done "RALPH_DONE" ```

---

## When to Regenerate the Plan

Plans drift. Regenerate when:

- Ralph is going off track (implementing wrong things) - Plan feels stale or doesn't match current state - Too much clutter from completed items - You've made significant spec changes - You're confused about what's actually done

Just switch back to planning mode:

```bash ./loop.sh plan ```

Regeneration cost is one Planning loop. Cheap compared to Ralph going in circles.

---

## Safety

Ralph requires `--dangerously-skip-permissions` to run autonomously. This bypasses Claude's permission system entirely.

**Philosophy:** "It's not if it gets popped, it's when. And what is the blast radius?"

**Protections:** - Run in isolated environments (Docker, VM) - Only the API keys needed for the task - No access to private data beyond requirements - Restrict network connectivity where possible - **Escape hatches:** Ctrl+C stops the loop; `git reset --hard` reverts uncommitted changes

---

## Cost Expectations

| Task Type | Model | Iterations | Est. Cost | |-----------|-------|------------|-----------| | Generate plan | Opus | 1-2 | $0.50-1.00 | | Implement simple feature | Opus | 3-5 | $1.00-2.00 | | Implement complex feature | Opus | 10-20 | $3.00-8.00 | | Full project buildout | Opus | 50+ | $15-50+ |

**Tip:** Use Sonnet for simpler tasks where plan is clear. Use Opus for planning and complex reasoning.

---

## Real-World Results

From Geoffrey Huntley: - 6 repos generated overnight at YC hackathon - $50k contract completed for $297 in API costs - Created entire programming language over 3 months

---

## Advanced: Running as Sub-Agent

For long loops, spawn as sub-agent so main session stays responsive:

```javascript sessions_spawn({ task: `cd /path/to/project && ./loop.sh build 20 Summarize what was implemented when done.`, label: "ralph-build", model: "opus" }) ```

Check progress: ```javascript sessions_list({ kinds: ["spawn"] }) sessions_history({ label: "ralph-build", limit: 5 }) ```

---

## Troubleshooting

### Ralph keeps implementing the same thing - Plan is stale → regenerate with `./loop.sh plan` - Backpressure missing → add tests that catch duplicates

### Ralph goes in circles - Add more specific guardrails to prompts - Check if specs are ambiguous - Regenerate plan

### Context getting bloated - Ensure one task per iteration (check prompt) - Keep AGENTS.md under 60 lines - Move status/progress to IMPLEMENTATION_PLAN.md, not AGENTS.md

### Tests not running - Check AGENTS.md has correct validation commands - Ensure backpressure section in prompt references AGENTS.md

---

## Edge Cases

### Projects Without Git

The loop script expects git for commits and pushes. For projects without version control:

**Option 1: Initialize git anyway** (recommended) ```bash git init git add -A git commit -m "Initial commit before Ralph" ```

**Option 2: Modify the prompts** - Remove git-related guardrails from PROMPT_build.md - Remove the git push section from loop.sh - Use file backups instead: add `cp -r src/ backups/iteration-$ITERATION/` to loop.sh

**Option 3: Use tarball snapshots** ```bash # Add to loop.sh before each iteration: tar -czf "snapshots/pre-iteration-$ITERATION.tar.gz" src/ ```

### Very Large Codebases

For codebases with 100K+ lines:

- **Reduce subagent parallelism:** Change "up to 500 parallel Sonnet subagents" to "up to 50" in prompts - **Scope narrowly:** Use focused specs that target specific directories - **Add path restrictions:** In AGENTS.md, note which directories are in-scope - **Consider workspace splitting:** Treat large modules as separate Ralph projects

### When Claude CLI Isn't Available

The methodology works with any Claude interface:

**Claude API directly:** ```bash # Replace loop.sh with API calls using curl or a script curl https://api.anthropic.com/v1/messages \ -H "x-api-key: $ANTHROPIC_API_KEY" \ -H "content-type: application/json" \ -d '{"model": "claude-sonnet-4-20250514", "max_tokens": 8192, "messages": [...]}' ```

**Alternative agents:** - **Aider:** `aider --opus --auto-commits` - **Continue.dev:** Use with Claude API key - **Cursor:** Composer mode with PROMPT files as context

The key principles (one task per iteration, fresh context, backpressure) apply regardless of tooling.

### Non-Node.js Projects

Adapt AGENTS.md for your stack:

| Stack | Build | Test | Lint | |-------|-------|------|------| | Python | `pip install -e .` | `pytest` | `ruff .` | | Go | `go build ./...` | `go test ./...` | `golangci-lint run` | | Rust | `cargo build` | `cargo test` | `cargo clippy` | | Ruby | `bundle install` | `rspec` | `rubocop` |

Also update path references in prompts (`src/*` → your source directory).

---

## Learn More

- Geoffrey Huntley: https://ghuntley.com/ralph/ - Clayton Farr's Playbook: https://github.com/ClaytonFarr/ralph-playbook - Geoffrey's Fork: https://github.com/ghuntley/how-to-ralph-wiggum

---

## Credits

Built by **Johnathan & Q** — a human-AI dyad.

- Twitter: [@spacepixel](https://x.com/spacepixel) - ClawdHub: [clawhub.ai/skills/ralph-loops](https://www.clawhub.ai/skills/ralph-loops)

Back

Introduction

More Products

Nano Banana Pro

Gemini

Pg Release