ClawSkills logoClawSkills

Memory Pipeline

Complete agent memory + performance system. Extracts structured facts, builds knowledge graphs, generates briefings, and enforces execution discipline via pre-g

Introduction

# Memory Pipeline

**Give your AI agent a memory that actually works.**

AI agents wake up blank every session. Memory Pipeline fixes that — it extracts what matters from past conversations, connects the dots, and generates a daily briefing so your agent starts each session primed instead of clueless.

## What It Does

| Component | When it runs | What it does | |-----------|-------------|--------------| | **Extract** | Between sessions | Pulls structured facts (decisions, preferences, learnings) from daily notes and transcripts | | **Link** | Between sessions | Builds a knowledge graph — connects related facts, flags contradictions | | **Brief** | Between sessions | Generates a compact `BRIEFING.md` loaded at session start | | **Ingest** | On demand | Imports external knowledge (ChatGPT exports, etc.) into searchable memory | | **Performance Hooks** | During sessions | Pre-game briefing injection, tool discipline, output compression, after-action review |

## Why This Is Different

Most "memory" solutions are just vector search over chat logs. This is a **cognitive architecture** — inspired by how human memory actually works:

- **Extraction over accumulation** — Instead of dumping everything into a database, it identifies what's worth remembering: decisions, preferences, learnings, commitments. The rest is noise. - **Knowledge graph, not just embeddings** — Facts get linked to each other with bidirectional relationships. Your agent doesn't just find similar text — it understands that a decision about your tech stack relates to a project deadline relates to a preference you stated three weeks ago. - **Briefing over retrieval** — Rather than hoping the right context gets retrieved at query time, your agent starts every session with a curated cheat sheet. Active projects, recent decisions, personality reminders. Zero cold-start lag. - **No mid-swing coaching** — Borrowed from performance psychology. Corrections happen *between* sessions, not during. The after-action review feeds into the next briefing. The loop is closed — just not mid-execution.

## Quick Start

### Install

```bash clawdhub install memory-pipeline ```

### Setup

```bash bash skills/memory-pipeline/scripts/setup.sh ```

The setup script will detect your workspace, check dependencies (Python 3 + any LLM API key), create the `memory/` directory, and run the full pipeline.

### Requirements

- **Python 3** - **At least one LLM API key** (auto-detected): - OpenAI (`OPENAI_API_KEY` or `~/.config/openai/api_key`) - Anthropic (`ANTHROPIC_API_KEY` or `~/.config/anthropic/api_key`) - Gemini (`GEMINI_API_KEY` or `~/.config/gemini/api_key`)

### Run Manually

```bash # Full pipeline python3 skills/memory-pipeline/scripts/memory-extract.py python3 skills/memory-pipeline/scripts/memory-link.py python3 skills/memory-pipeline/scripts/memory-briefing.py ```

### Automate via Heartbeat

Add to your `HEARTBEAT.md` for daily automatic runs:

```markdown ### Daily Memory Pipeline - **Frequency:** Once per day (morning) - **Action:** Run the memory pipeline: 1. `python3 skills/memory-pipeline/scripts/memory-extract.py` 2. `python3 skills/memory-pipeline/scripts/memory-link.py` 3. `python3 skills/memory-pipeline/scripts/memory-briefing.py` ```

## Import External Knowledge

Already have years of conversations in ChatGPT? Import them so your agent knows what you know.

### ChatGPT Export

```bash # 1. Export from ChatGPT: Settings → Data Controls → Export Data # 2. Drop the zip in your workspace # 3. Run: python3 skills/memory-pipeline/scripts/ingest-chatgpt.py ~/imports/chatgpt-export.zip

# Preview first (recommended): python3 skills/memory-pipeline/scripts/ingest-chatgpt.py ~/imports/chatgpt-export.zip --dry-run ```

**What it does:** - Parses ChatGPT's conversation tree format - Filters out throwaway conversations (configurable: `--min-turns`, `--min-length`) - Supports topic exclusion (edit `EXCLUDE_PATTERNS` to skip unwanted topics) - Outputs clean, dated markdown files to `memory/knowledge/chatgpt/` - Files are automatically indexed by OpenClaw's semantic search

**Options:** - `--dry-run` — Preview without writing files - `--keep-all` — Skip all filtering - `--min-turns N` — Minimum user messages to keep (default: 2) - `--min-length N` — Minimum total characters (default: 200)

### Adding Other Sources

The pattern is extensible. Create `ingest-<source>.py`, parse the format, write markdown to `memory/knowledge/<source>/`. The indexer handles the rest.

## How the Pipeline Works

### Stage 1: Extract

**Script:** `memory-extract.py`

Reads daily notes (`memory/YYYY-MM-DD.md`) and session transcripts, then uses an LLM to extract structured facts:

```json {"type": "decision", "content": "Use Rust for the backend", "subject": "Project Architecture", "confidence": 0.9} {"type": "preference", "content": "Prefers Google Drive over Notion", "subject": "Tools", "confidence": 0.95} ```

**Output:** `memory/extracted.jsonl`

### Stage 2: Link

**Script:** `memory-link.py`

Takes extracted facts and builds a knowledge graph: - Generates embeddings for semantic similarity - Creates bidirectional links between related facts - Detects contradictions and marks superseded facts - Auto-generates domain tags

**Output:** `memory/knowledge-graph.json` + `memory/knowledge-summary.md`

### Stage 3: Briefing

**Script:** `memory-briefing.py`

Generates a compact daily briefing (< 2000 chars) combining: - Personality traits (from `SOUL.md`) - User context (from `USER.md`) - Active projects and recent decisions - Open todos

**Output:** `BRIEFING.md` (workspace root)

## Performance Hooks (Optional)

Four lifecycle hooks that enforce execution discipline during sessions. Based on a principle from performance psychology: **separate preparation from execution**.

``` User Message → Agent Loop ├── before_agent_start → Briefing packet (memory + checklist) ├── before_tool_call → Policy enforcement (deny list) ├── tool_result_persist → Output compression (prevent context bloat) └── agent_end → After-action review (durable notes) ```

### Configuration

```json { "enabled": true, "briefing": { "maxChars": 6000, "checklist": [ "Restate the task in one sentence.", "List constraints and success criteria.", "Retrieve only the minimum relevant memory.", "Prefer tools over guessing when facts matter." ], "memoryFiles": ["memory/IDENTITY.md", "memory/PROJECTS.md"] }, "tools": { "deny": ["dangerous_tool"], "maxToolResultChars": 12000 }, "afterAction": { "writeMemoryFile": "memory/AFTER_ACTION.md", "maxBullets": 8 } } ```

### Hook Details

| Hook | What it does | |------|-------------| | `before_agent_start` | Loads memory files, builds bounded briefing packet, injects into system prompt | | `before_tool_call` | Checks tool against deny list, prevents unsafe calls | | `tool_result_persist` | Head (60%) + tail (30%) compression of large results | | `agent_end` | Appends session summary to memory file with tools used and outcomes |

## Output Files

| File | Location | Purpose | |------|----------|---------| | `BRIEFING.md` | Workspace root | Daily context cheat sheet | | `extracted.jsonl` | `memory/` | All extracted facts (append-only) | | `knowledge-graph.json` | `memory/` | Full graph with embeddings and links | | `knowledge-summary.md` | `memory/` | Human-readable graph summary | | `knowledge/chatgpt/*.md` | `memory/` | Ingested ChatGPT conversations |

## Customization

- **Change LLM models** — Edit model names in each script (supports OpenAI, Anthropic, Gemini) - **Adjust extraction** — Modify the extraction prompt in `memory-extract.py` to focus on different fact types - **Tune link sensitivity** — Change the similarity threshold in `memory-link.py` (default: 0.3) - **Filter ingestion** — Edit `EXCLUDE_PATTERNS` in `ingest-chatgpt.py` for topic exclusion

## Troubleshooting

| Problem | Fix | |---------|-----| | No facts extracted | Check that daily notes or transcripts exist; verify API key | | Low-quality links | Add OpenAI key for embedding-based similarity; adjust threshold | | Briefing too long | Reduce facts in template or let LLM generation handle it (auto-constrained to 2000 chars) |

## See Also

- [Setup Guide](references/setup.md) — Detailed installation and configuration

More Products