Introduction
# Memory Pipeline
**Give your AI agent a memory that actually works.**
AI agents wake up blank every session. Memory Pipeline fixes that — it extracts what matters from past conversations, connects the dots, and generates a daily briefing so your agent starts each session primed instead of clueless.
## What It Does
| Component | When it runs | What it does | |-----------|-------------|--------------| | **Extract** | Between sessions | Pulls structured facts (decisions, preferences, learnings) from daily notes and transcripts | | **Link** | Between sessions | Builds a knowledge graph — connects related facts, flags contradictions | | **Brief** | Between sessions | Generates a compact `BRIEFING.md` loaded at session start | | **Ingest** | On demand | Imports external knowledge (ChatGPT exports, etc.) into searchable memory | | **Performance Hooks** | During sessions | Pre-game briefing injection, tool discipline, output compression, after-action review |
## Why This Is Different
Most "memory" solutions are just vector search over chat logs. This is a **cognitive architecture** — inspired by how human memory actually works:
- **Extraction over accumulation** — Instead of dumping everything into a database, it identifies what's worth remembering: decisions, preferences, learnings, commitments. The rest is noise. - **Knowledge graph, not just embeddings** — Facts get linked to each other with bidirectional relationships. Your agent doesn't just find similar text — it understands that a decision about your tech stack relates to a project deadline relates to a preference you stated three weeks ago. - **Briefing over retrieval** — Rather than hoping the right context gets retrieved at query time, your agent starts every session with a curated cheat sheet. Active projects, recent decisions, personality reminders. Zero cold-start lag. - **No mid-swing coaching** — Borrowed from performance psychology. Corrections happen *between* sessions, not during. The after-action review feeds into the next briefing. The loop is closed — just not mid-execution.
## Quick Start
### Install
```bash clawdhub install memory-pipeline ```
### Setup
```bash bash skills/memory-pipeline/scripts/setup.sh ```
The setup script will detect your workspace, check dependencies (Python 3 + any LLM API key), create the `memory/` directory, and run the full pipeline.
### Requirements
- **Python 3** - **At least one LLM API key** (auto-detected): - OpenAI (`OPENAI_API_KEY` or `~/.config/openai/api_key`) - Anthropic (`ANTHROPIC_API_KEY` or `~/.config/anthropic/api_key`) - Gemini (`GEMINI_API_KEY` or `~/.config/gemini/api_key`)
### Run Manually
```bash # Full pipeline python3 skills/memory-pipeline/scripts/memory-extract.py python3 skills/memory-pipeline/scripts/memory-link.py python3 skills/memory-pipeline/scripts/memory-briefing.py ```
### Automate via Heartbeat
Add to your `HEARTBEAT.md` for daily automatic runs:
```markdown ### Daily Memory Pipeline - **Frequency:** Once per day (morning) - **Action:** Run the memory pipeline: 1. `python3 skills/memory-pipeline/scripts/memory-extract.py` 2. `python3 skills/memory-pipeline/scripts/memory-link.py` 3. `python3 skills/memory-pipeline/scripts/memory-briefing.py` ```
## Import External Knowledge
Already have years of conversations in ChatGPT? Import them so your agent knows what you know.
### ChatGPT Export
```bash # 1. Export from ChatGPT: Settings → Data Controls → Export Data # 2. Drop the zip in your workspace # 3. Run: python3 skills/memory-pipeline/scripts/ingest-chatgpt.py ~/imports/chatgpt-export.zip
# Preview first (recommended): python3 skills/memory-pipeline/scripts/ingest-chatgpt.py ~/imports/chatgpt-export.zip --dry-run ```
**What it does:** - Parses ChatGPT's conversation tree format - Filters out throwaway conversations (configurable: `--min-turns`, `--min-length`) - Supports topic exclusion (edit `EXCLUDE_PATTERNS` to skip unwanted topics) - Outputs clean, dated markdown files to `memory/knowledge/chatgpt/` - Files are automatically indexed by OpenClaw's semantic search
**Options:** - `--dry-run` — Preview without writing files - `--keep-all` — Skip all filtering - `--min-turns N` — Minimum user messages to keep (default: 2) - `--min-length N` — Minimum total characters (default: 200)
### Adding Other Sources
The pattern is extensible. Create `ingest-<source>.py`, parse the format, write markdown to `memory/knowledge/<source>/`. The indexer handles the rest.
## How the Pipeline Works
### Stage 1: Extract
**Script:** `memory-extract.py`
Reads daily notes (`memory/YYYY-MM-DD.md`) and session transcripts, then uses an LLM to extract structured facts:
```json {"type": "decision", "content": "Use Rust for the backend", "subject": "Project Architecture", "confidence": 0.9} {"type": "preference", "content": "Prefers Google Drive over Notion", "subject": "Tools", "confidence": 0.95} ```
**Output:** `memory/extracted.jsonl`
### Stage 2: Link
**Script:** `memory-link.py`
Takes extracted facts and builds a knowledge graph: - Generates embeddings for semantic similarity - Creates bidirectional links between related facts - Detects contradictions and marks superseded facts - Auto-generates domain tags
**Output:** `memory/knowledge-graph.json` + `memory/knowledge-summary.md`
### Stage 3: Briefing
**Script:** `memory-briefing.py`
Generates a compact daily briefing (< 2000 chars) combining: - Personality traits (from `SOUL.md`) - User context (from `USER.md`) - Active projects and recent decisions - Open todos
**Output:** `BRIEFING.md` (workspace root)
## Performance Hooks (Optional)
Four lifecycle hooks that enforce execution discipline during sessions. Based on a principle from performance psychology: **separate preparation from execution**.
``` User Message → Agent Loop ├── before_agent_start → Briefing packet (memory + checklist) ├── before_tool_call → Policy enforcement (deny list) ├── tool_result_persist → Output compression (prevent context bloat) └── agent_end → After-action review (durable notes) ```
### Configuration
```json { "enabled": true, "briefing": { "maxChars": 6000, "checklist": [ "Restate the task in one sentence.", "List constraints and success criteria.", "Retrieve only the minimum relevant memory.", "Prefer tools over guessing when facts matter." ], "memoryFiles": ["memory/IDENTITY.md", "memory/PROJECTS.md"] }, "tools": { "deny": ["dangerous_tool"], "maxToolResultChars": 12000 }, "afterAction": { "writeMemoryFile": "memory/AFTER_ACTION.md", "maxBullets": 8 } } ```
### Hook Details
| Hook | What it does | |------|-------------| | `before_agent_start` | Loads memory files, builds bounded briefing packet, injects into system prompt | | `before_tool_call` | Checks tool against deny list, prevents unsafe calls | | `tool_result_persist` | Head (60%) + tail (30%) compression of large results | | `agent_end` | Appends session summary to memory file with tools used and outcomes |
## Output Files
| File | Location | Purpose | |------|----------|---------| | `BRIEFING.md` | Workspace root | Daily context cheat sheet | | `extracted.jsonl` | `memory/` | All extracted facts (append-only) | | `knowledge-graph.json` | `memory/` | Full graph with embeddings and links | | `knowledge-summary.md` | `memory/` | Human-readable graph summary | | `knowledge/chatgpt/*.md` | `memory/` | Ingested ChatGPT conversations |
## Customization
- **Change LLM models** — Edit model names in each script (supports OpenAI, Anthropic, Gemini) - **Adjust extraction** — Modify the extraction prompt in `memory-extract.py` to focus on different fact types - **Tune link sensitivity** — Change the similarity threshold in `memory-link.py` (default: 0.3) - **Filter ingestion** — Edit `EXCLUDE_PATTERNS` in `ingest-chatgpt.py` for topic exclusion
## Troubleshooting
| Problem | Fix | |---------|-----| | No facts extracted | Check that daily notes or transcripts exist; verify API key | | Low-quality links | Add OpenAI key for embedding-based similarity; adjust threshold | | Briefing too long | Reduce facts in template or let LLM generation handle it (auto-constrained to 2000 chars) |
## See Also
- [Setup Guide](references/setup.md) — Detailed installation and configuration