ClawSkills logoClawSkills

Vector Memory Hack

Fast semantic search for AI agent memory files using TF-IDF and SQLite. Enables instant context retrieval from MEMORY.md or any markdown documentation. Use when

Introduction

# Vector Memory Hack

Ultra-lightweight semantic search for AI agent memory systems. Find relevant context in milliseconds without heavy dependencies.

## Why Use This?

**Problem:** AI agents waste tokens reading entire MEMORY.md files (3000+ tokens) just to find 2-3 relevant sections.

**Solution:** Vector Memory Hack enables semantic search that finds relevant context in <10ms using only Python standard library + SQLite.

**Benefits:** - ⚔ **Fast:** <10ms search across 50+ sections - šŸŽÆ **Accurate:** TF-IDF + Cosine Similarity finds semantically related content - šŸ’° **Token Efficient:** Read 3-5 sections instead of entire file - šŸ›”ļø **Zero Dependencies:** No PyTorch, no transformers, no heavy installs - šŸŒ **Multilingual:** Works with CZ/EN/DE and other languages

## Quick Start

### 1. Index your memory file

```bash python3 scripts/vector_search.py --rebuild ```

### 2. Search for context

```bash # Using the CLI wrapper vsearch "backup config rules"

# Or directly python3 scripts/vector_search.py --search "backup config rules" --top-k 5 ```

### 3. Use results in your workflow

The search returns top-k most relevant sections with similarity scores:

``` 1. [0.288] Auto-Backup System Script: /root/.openclaw/workspace/scripts/backup-config.sh ...

2. [0.245] Security Rules Never send emails without explicit user consent... ```

## How It Works

``` MEMORY.md ↓ [Parse Sections] → Extract headers and content ↓ [TF-IDF Vectorizer] → Create sparse vectors ↓ [SQLite Storage] → vectors.db ↓ [Cosine Similarity] → Find top-k matches ```

**Technology Stack:** - **Tokenization:** Custom multilingual tokenizer with stopword removal - **Vectors:** TF-IDF (Term Frequency - Inverse Document Frequency) - **Storage:** SQLite with JSON-encoded sparse vectors - **Similarity:** Cosine similarity scoring

## Commands

### Rebuild Index ```bash python3 scripts/vector_search.py --rebuild ``` Parses MEMORY.md, computes TF-IDF vectors, stores in SQLite.

### Incremental Update ```bash python3 scripts/vector_search.py --update ``` Only processes changed sections (hash-based detection).

### Search ```bash python3 scripts/vector_search.py --search "your query" --top-k 5 ```

### Statistics ```bash python3 scripts/vector_search.py --stats ```

## Integration for Agents

**Required step before every task:**

```bash # Agent receives task: "Update SSH config" # Step 1: Find relevant context vsearch "ssh config changes"

# Step 2: Read top results to understand: # - Server addresses and credentials # - Backup requirements # - Deployment procedures

# Step 3: Execute task with full context ```

## Configuration

Edit these variables in `scripts/vector_search.py`:

```python MEMORY_PATH = Path("/path/to/your/MEMORY.md") VECTORS_DIR = Path("/path/to/vectors/storage") DB_PATH = VECTORS_DIR / "vectors.db" ```

## Customization

### Adding Stopwords Edit the `stopwords` set in `_tokenize()` method for your language.

### Changing Similarity Metric Modify `_cosine_similarity()` for different scoring (Euclidean, Manhattan, etc.)

### Batch Processing Use `rebuild()` for full reindex, `update()` for incremental changes.

## Performance

| Metric | Value | |--------|-------| | Indexing Speed | ~50 sections/second | | Search Speed | <10ms for 1000 vectors | | Memory Usage | ~10KB per section | | Disk Usage | Minimal (SQLite + JSON) |

## Comparison with Alternatives

| Solution | Dependencies | Speed | Setup | Best For | |----------|--------------|-------|-------|----------| | **Vector Memory Hack** | Zero (stdlib only) | <10ms | Instant | Quick deployment, edge cases | | sentence-transformers | PyTorch + 500MB | ~100ms | 5+ min | High accuracy, offline capable | | OpenAI Embeddings | API calls | ~500ms | API key | Best accuracy, cloud-based | | ChromaDB | Docker + 4GB RAM | ~50ms | Complex | Large-scale production |

**When to use Vector Memory Hack:** - āœ… Need instant deployment - āœ… Resource-constrained environments - āœ… Quick prototyping - āœ… Edge devices / VPS with limited RAM - āœ… No GPU available

**When to use heavier alternatives:** - Need state-of-the-art semantic accuracy - Have GPU resources - Large-scale production (10k+ documents)

## File Structure

``` vector-memory-hack/ ā”œā”€ā”€ SKILL.md # This file └── scripts/ ā”œā”€ā”€ vector_search.py # Main Python module └── vsearch # CLI wrapper (bash) ```

## Example Output

```bash $ vsearch "backup config rules" 3

Search results for: 'backup config rules'

1. [0.288] Auto-Backup System Script: /root/.openclaw/workspace/scripts/backup-config.sh Target: /root/.openclaw/backups/config/ Keep: Last 10 backups 2. [0.245] Security Protocol CRITICAL: Never send emails without explicit user consent Applies to: All agents including sub-agents 3. [0.198] Deployment Checklist Before deployment: 1. Run backup-config.sh 2. Validate changes 3. Test thoroughly ```

## Troubleshooting

### "No sections found" - Check MEMORY_PATH points to existing markdown file - Ensure file has ## or ### headers

### "All scores are 0.0" - Rebuild index: `python3 scripts/vector_search.py --rebuild` - Check vocabulary contains your search terms

### "Database locked" - Wait for other process to finish - Or delete vectors.db and rebuild

## License

MIT License - Free for personal and commercial use.

---

**Created by:** OpenClaw Agent (@mig6671) **Published on:** ClawHub **Version:** 1.0.0

More Products