Memory Pipeline

介绍

# Memory Pipeline

**Give your AI agent a memory that actually works.**

**赋予你的 AI 智能体真正可靠的记忆。**

AI agents wake up blank every session. Memory Pipeline fixes that — it extracts what matters from past conversations, connects the dots, and generates a daily briefing so your agent starts each session primed instead of clueless.

每次会话开始时 AI 智能体都是一片空白。Memory Pipeline 解决了这个问题——它从过去的对话中提取重要信息，建立关联，并生成每日简报，让你的智能体每次会话开始时都准备充分，而不是不知所措。

## What It Does

## 它的功能

| Component | When it runs | What it does | |-----------|-------------|--------------| | **Extract** | Between sessions | Pulls structured facts (decisions, preferences, learnings) from daily notes and transcripts | | **提取** | 会话之间 | 从每日笔记和转录中提取结构化事实（决策、偏好、学习成果） | | **Link** | Between sessions | Builds a knowledge graph — connects related facts, flags contradictions | | **链接** | 会话之间 | 构建知识图谱——连接相关事实，标记矛盾 | | **Brief** | Between sessions | Generates a compact `BRIEFING.md` loaded at session start | | **简报** | 会话之间 | 生成一个紧凑的 `BRIEFING.md`，在会话开始时加载 | | **Ingest** | On demand | Imports external knowledge (ChatGPT exports, etc.) into searchable memory | | **摄取** | 按需 | 将外部知识（ChatGPT 导出记录等）导入可搜索的记忆中 | | **Performance Hooks** | During sessions | Pre-game briefing injection, tool discipline, output compression, after-action review | | **性能钩子** | 会话期间 | 预先简报注入、工具纪律、输出压缩、行动后回顾 |

## Why This Is Different

## 为什么与众不同

Most "memory" solutions are just vector search over chat logs. This is a **cognitive architecture** — inspired by how human memory actually works:

大多数“记忆”解决方案只是对聊天日志进行向量搜索。这是一种**认知架构**——灵感来源于人类记忆的实际运作方式：

- **Extraction over accumulation** — Instead of dumping everything into a database, it identifies what's worth remembering: decisions, preferences, learnings, commitments. The rest is noise.

**提取优于累积**——不是将所有内容倾倒到数据库中，而是识别值得记忆的内容：决策、偏好、学习成果、承诺。其余的则是噪声。

- **Knowledge graph, not just embeddings** — Facts get linked to each other with bidirectional relationships. Your agent doesn't just find similar text — it understands that a decision about your tech stack relates to a project deadline relates to a preference you stated three weeks ago.

**知识图谱，不仅仅是嵌入**——事实之间通过双向关系相互链接。你的智能体不仅仅是找到相似的文本——它明白关于技术栈的决策与项目截止日期相关，也与你在三周前陈述的偏好相关。

- **Briefing over retrieval** — Rather than hoping the right context gets retrieved at query time, your agent starts every session with a curated cheat sheet. Active projects, recent decisions, personality reminders. Zero cold-start lag.

**简报优于检索**——与其寄希望于在查询时检索到正确的上下文，不如让智能体每次会话开始时都拥有一份精心整理的速查表。活跃项目、近期决策、个性提醒。零冷启动延迟。

- **No mid-swing coaching** — Borrowed from performance psychology. Corrections happen *between* sessions, not during. The after-action review feeds into the next briefing. The loop is closed — just not mid-execution.

**中途无干预指导**——借鉴于表现心理学。纠正发生在*会话之间*，而不是期间。行动后回顾会反馈到下一次简报中。循环是闭环的——只是不在执行中途。

## Quick Start

## 快速开始

### Install

### 安装

```bash clawdhub install memory-pipeline ```

### Setup

### 设置

```bash bash skills/memory-pipeline/scripts/setup.sh ```

The setup script will detect your workspace, check dependencies (Python 3 + any LLM API key), create the `memory/` directory, and run the full pipeline.

设置脚本将检测你的工作区，检查依赖项（Python 3 和任何 LLM API 密钥），创建 `memory/` 目录，并运行完整管道。

### Requirements

### 要求

- **Python 3**

- **At least one LLM API key** (auto-detected):

- OpenAI (`OPENAI_API_KEY` or `~/.config/openai/api_key`)

- Anthropic (`ANTHROPIC_API_KEY` or `~/.config/anthropic/api_key`)

- Gemini (`GEMINI_API_KEY` or `~/.config/gemini/api_key`)

- **至少一个 LLM API 密钥**（自动检测）：

- OpenAI (`OPENAI_API_KEY` 或 `~/.config/openai/api_key`)

- Anthropic (`ANTHROPIC_API_KEY` 或 `~/.config/anthropic/api_key`)

- Gemini (`GEMINI_API_KEY` 或 `~/.config/gemini/api_key`)

### Run Manually

### 手动运行

```bash # Full pipeline python3 skills/memory-pipeline/scripts/memory-extract.py python3 skills/memory-pipeline/scripts/memory-link.py python3 skills/memory-pipeline/scripts/memory-briefing.py ```

### Automate via Heartbeat

### 通过 Heartbeat 自动化

Add to your `HEARTBEAT.md` for daily automatic runs:

添加到你的 `HEARTBEAT.md` 中以每日自动运行：

```markdown ### Daily Memory Pipeline - **Frequency:** Once per day (morning) - **Action:** Run the memory pipeline: 1. `python3 skills/memory-pipeline/scripts/memory-extract.py` 2. `python3 skills/memory-pipeline/scripts/memory-link.py` 3. `python3 skills/memory-pipeline/scripts/memory-briefing.py` ```

## Import External Knowledge

## 导入外部知识

Already have years of conversations in ChatGPT? Import them so your agent knows what you know.

已经在 ChatGPT 中积累了数年的对话？导入它们，让你的智能体知道你所知道的信息。

### ChatGPT Export

### ChatGPT 导出

```bash # 1. Export from ChatGPT: Settings → Data Controls → Export Data # 2. Drop the zip in your workspace # 3. Run: python3 skills/memory-pipeline/scripts/ingest-chatgpt.py ~/imports/chatgpt-export.zip

# Preview first (recommended): python3 skills/memory-pipeline/scripts/ingest-chatgpt.py ~/imports/chatgpt-export.zip --dry-run ```

**What it does:**

**它的作用：**

- Parses ChatGPT's conversation tree format

解析 ChatGPT 的对话树格式

- Filters out throwaway conversations (configurable: `--min-turns`, `--min-length`)

过滤掉无用的对话（可配置：`--min-turns`、`--min-length`）

- Supports topic exclusion (edit `EXCLUDE_PATTERNS` to skip unwanted topics)

支持主题排除（编辑 `EXCLUDE_PATTERNS` 以跳过不需要的主题）

- Outputs clean, dated markdown files to `memory/knowledge/chatgpt/`

将干净的、带日期的 markdown 文件输出到 `memory/knowledge/chatgpt/`

- Files are automatically indexed by OpenClaw's semantic search

文件会被 OpenClaw 的语义搜索自动索引

**Options:**

**选项：**

- `--dry-run` — Preview without writing files

`--dry-run` — 预览而不写入文件

- `--keep-all` — Skip all filtering

`--keep-all` — 跳过所有过滤

- `--min-turns N` — Minimum user messages to keep (default: 2)

`--min-turns N` — 保留的最少用户消息数（默认：2）

- `--min-length N` — Minimum total characters (default: 200)

`--min-length N` — 最少总字符数（默认：200）

### Adding Other Sources

### 添加其他来源

The pattern is extensible. Create `ingest-<source>.py`, parse the format, write markdown to `memory/knowledge/<source>/`. The indexer handles the rest.

该模式是可扩展的。创建 `ingest-<source>.py`，解析格式，将 markdown 写入 `memory/knowledge/<source>/`。索引器将处理其余部分。

## How the Pipeline Works

## 管道如何工作

### Stage 1: Extract

### 阶段 1：提取

**Script:** `memory-extract.py`

**脚本：** `memory-extract.py`

Reads daily notes (`memory/YYYY-MM-DD.md`) and session transcripts, then uses an LLM to extract structured facts:

读取每日笔记（`memory/YYYY-MM-DD.md`）和会话转录，然后使用 LLM 提取结构化事实：

```json {"type": "decision", "content": "Use Rust for the backend", "subject": "Project Architecture", "confidence": 0.9} {"type": "preference", "content": "Prefers Google Drive over Notion", "subject": "Tools", "confidence": 0.95} ```

**Output:** `memory/extracted.jsonl`

**输出：** `memory/extracted.jsonl`

### Stage 2: Link

### 阶段 2：链接

**Script:** `memory-link.py`

**脚本：** `memory-link.py`

Takes extracted facts and builds a knowledge graph:

获取提取的事实并构建知识图谱：

- Generates embeddings for semantic similarity

生成嵌入以实现语义相似性

- Creates bidirectional links between related facts

在相关事实之间创建双向链接

- Detects contradictions and marks superseded facts

检测矛盾并标记被取代的事实

- Auto-generates domain tags

自动生成域标签

**Output:** `memory/knowledge-graph.json` + `memory/knowledge-summary.md`

**输出：** `memory/knowledge-graph.json` + `memory/knowledge-summary.md`

### Stage 3: Briefing

### 阶段 3：简报

**Script:** `memory-briefing.py`

**脚本：** `memory-briefing.py`

Generates a compact daily briefing (< 2000 chars) combining:

生成一个紧凑的每日简报（< 2000 字符），其中包括：

- Personality traits (from `SOUL.md`)

个性特征（来自 `SOUL.md`）

- User context (from `USER.md`)

用户上下文（来自 `USER.md`）

- Active projects and recent decisions

活跃项目和近期决策

- Open todos

待办事项

**Output:** `BRIEFING.md` (workspace root)

**输出：** `BRIEFING.md`（工作区根目录）

## Performance Hooks (Optional)

## 性能钩子（可选）

Four lifecycle hooks that enforce execution discipline during sessions. Based on a principle from performance psychology: **separate preparation from execution**.

四个生命周期钩子，用于在会话期间强制执行纪律。基于表现心理学的一个原则：**将准备与执行分开**。

``` User Message → Agent Loop ├── before_agent_start → Briefing packet (memory + checklist) ├── before_tool_call → Policy enforcement (deny list) ├── tool_result_persist → Output compression (prevent context bloat) └── agent_end → After-action review (durable notes) ```

### Configuration

### 配置

```json { "enabled": true, "briefing": { "maxChars": 6000, "checklist": [ "Restate the task in one sentence.", "List constraints and success criteria.", "Retrieve only the minimum relevant memory.", "Prefer tools over guessing when facts matter." ], "memoryFiles": ["memory/IDENTITY.md", "memory/PROJECTS.md"] }, "tools": { "deny": ["dangerous_tool"], "maxToolResultChars": 12000 }, "afterAction": { "writeMemoryFile": "memory/AFTER_ACTION.md", "maxBullets": 8 } } ```

### Hook Details

### 钩子详情

| Hook | What it does | |------|-------------| | `before_agent_start` | Loads memory files, builds bounded briefing packet, injects into system prompt | | `before_agent_start` | 加载记忆文件，构建有界的简报包，注入到系统提示词中 | | `before_tool_call` | Checks tool against deny list, prevents unsafe calls | | `before_tool_call` | 根据拒绝列表检查工具，防止不安全的调用 | | `tool_result_persist` | Head (60%) + tail (30%) compression of large results | | `tool_result_persist` | 对大型结果进行头部（60%）+ 尾部（30%）压缩 | | `agent_end` | Appends session summary to memory file with tools used and outcomes | | `agent_end` | 将会话摘要（包括使用的工具和结果）追加到记忆文件中 |

## Output Files

## 输出文件

| File | Location | Purpose | |------|----------|---------| | `BRIEFING.md` | Workspace root | Daily context cheat sheet | | `BRIEFING.md` | 工作区根目录 | 每日上下文速查表 | | `extracted.jsonl` | `memory/` | All extracted facts (append-only) | | `extracted.jsonl` | `memory/` | 所有提取的事实（仅追加） | | `knowledge-graph.json` | `memory/` | Full graph with embeddings and links | | `knowledge-graph.json` | `memory/` | 包含嵌入和链接的完整图谱 | | `knowledge-summary.md` | `memory/` | Human-readable graph summary | | `knowledge-summary.md` | `memory/` | 人类可读的图谱摘要 | | `knowledge/chatgpt/*.md` | `memory/` | Ingested ChatGPT conversations | | `knowledge/chatgpt/*.md` | `memory/` | 导入的 ChatGPT 对话 |

## Customization

## 自定义

- **Change LLM models** — Edit model names in each script (supports OpenAI, Anthropic, Gemini)

**更改 LLM 模型**——在每个脚本中编辑模型名称（支持 OpenAI、Anthropic、Gemini）

- **Adjust extraction** — Modify the extraction prompt in `memory-extract.py` to focus on different fact types

**调整提取**——修改 `memory-extract.py` 中的提取提示词，以专注于不同类型的事实

- **Tune link sensitivity** — Change the similarity threshold in `memory-link.py` (default: 0.3)

**调整链接灵敏度**——更改 `memory-link.py` 中的相似度阈值（默认：0.3）

- **Filter ingestion** — Edit `EXCLUDE_PATTERNS` in `ingest-chatgpt.py` for topic exclusion

**过滤导入**——编辑 `ingest-chatgpt.py` 中的 `EXCLUDE_PATTERNS` 以排除主题

## Troubleshooting

## 故障排除

| Problem | Fix | |---------|-----| | No facts extracted | Check that daily notes or transcripts exist; verify API key | | 未提取到事实 | 检查每日笔记或转录是否存在；验证 API 密钥 | | Low-quality links | Add OpenAI key for embedding-based similarity; adjust threshold | | 链接质量低 | 添加 OpenAI 密钥以进行基于嵌入的相似度比较；调整阈值 | | Briefing too long | Reduce facts in template or let LLM generation handle it (auto-constrained to 2000 chars) | | 简报太长 | 减少模板中的事实，或者让 LLM 生成来处理（自动限制为 2000 字符） |

## See Also

## 另请参阅

- [Setup Guide](references/setup.md) — Detailed installation and configuration

- [设置指南](references/setup.md) —— 详细的安装和配置说明

介绍

更多产品

Tavily Web Search

Humanize AI text

Humanizer