Introduction
# Vector Memory Hack
Ultra-lightweight semantic search for AI agent memory systems. Find relevant context in milliseconds without heavy dependencies.
## Why Use This?
**Problem:** AI agents waste tokens reading entire MEMORY.md files (3000+ tokens) just to find 2-3 relevant sections.
**Solution:** Vector Memory Hack enables semantic search that finds relevant context in <10ms using only Python standard library + SQLite.
**Benefits:** - ā” **Fast:** <10ms search across 50+ sections - šÆ **Accurate:** TF-IDF + Cosine Similarity finds semantically related content - š° **Token Efficient:** Read 3-5 sections instead of entire file - š”ļø **Zero Dependencies:** No PyTorch, no transformers, no heavy installs - š **Multilingual:** Works with CZ/EN/DE and other languages
## Quick Start
### 1. Index your memory file
```bash python3 scripts/vector_search.py --rebuild ```
### 2. Search for context
```bash # Using the CLI wrapper vsearch "backup config rules"
# Or directly python3 scripts/vector_search.py --search "backup config rules" --top-k 5 ```
### 3. Use results in your workflow
The search returns top-k most relevant sections with similarity scores:
``` 1. [0.288] Auto-Backup System Script: /root/.openclaw/workspace/scripts/backup-config.sh ...
2. [0.245] Security Rules Never send emails without explicit user consent... ```
## How It Works
``` MEMORY.md ā [Parse Sections] ā Extract headers and content ā [TF-IDF Vectorizer] ā Create sparse vectors ā [SQLite Storage] ā vectors.db ā [Cosine Similarity] ā Find top-k matches ```
**Technology Stack:** - **Tokenization:** Custom multilingual tokenizer with stopword removal - **Vectors:** TF-IDF (Term Frequency - Inverse Document Frequency) - **Storage:** SQLite with JSON-encoded sparse vectors - **Similarity:** Cosine similarity scoring
## Commands
### Rebuild Index ```bash python3 scripts/vector_search.py --rebuild ``` Parses MEMORY.md, computes TF-IDF vectors, stores in SQLite.
### Incremental Update ```bash python3 scripts/vector_search.py --update ``` Only processes changed sections (hash-based detection).
### Search ```bash python3 scripts/vector_search.py --search "your query" --top-k 5 ```
### Statistics ```bash python3 scripts/vector_search.py --stats ```
## Integration for Agents
**Required step before every task:**
```bash # Agent receives task: "Update SSH config" # Step 1: Find relevant context vsearch "ssh config changes"
# Step 2: Read top results to understand: # - Server addresses and credentials # - Backup requirements # - Deployment procedures
# Step 3: Execute task with full context ```
## Configuration
Edit these variables in `scripts/vector_search.py`:
```python MEMORY_PATH = Path("/path/to/your/MEMORY.md") VECTORS_DIR = Path("/path/to/vectors/storage") DB_PATH = VECTORS_DIR / "vectors.db" ```
## Customization
### Adding Stopwords Edit the `stopwords` set in `_tokenize()` method for your language.
### Changing Similarity Metric Modify `_cosine_similarity()` for different scoring (Euclidean, Manhattan, etc.)
### Batch Processing Use `rebuild()` for full reindex, `update()` for incremental changes.
## Performance
| Metric | Value | |--------|-------| | Indexing Speed | ~50 sections/second | | Search Speed | <10ms for 1000 vectors | | Memory Usage | ~10KB per section | | Disk Usage | Minimal (SQLite + JSON) |
## Comparison with Alternatives
| Solution | Dependencies | Speed | Setup | Best For | |----------|--------------|-------|-------|----------| | **Vector Memory Hack** | Zero (stdlib only) | <10ms | Instant | Quick deployment, edge cases | | sentence-transformers | PyTorch + 500MB | ~100ms | 5+ min | High accuracy, offline capable | | OpenAI Embeddings | API calls | ~500ms | API key | Best accuracy, cloud-based | | ChromaDB | Docker + 4GB RAM | ~50ms | Complex | Large-scale production |
**When to use Vector Memory Hack:** - ā Need instant deployment - ā Resource-constrained environments - ā Quick prototyping - ā Edge devices / VPS with limited RAM - ā No GPU available
**When to use heavier alternatives:** - Need state-of-the-art semantic accuracy - Have GPU resources - Large-scale production (10k+ documents)
## File Structure
``` vector-memory-hack/ āāā SKILL.md # This file āāā scripts/ āāā vector_search.py # Main Python module āāā vsearch # CLI wrapper (bash) ```
## Example Output
```bash $ vsearch "backup config rules" 3
Search results for: 'backup config rules'
1. [0.288] Auto-Backup System Script: /root/.openclaw/workspace/scripts/backup-config.sh Target: /root/.openclaw/backups/config/ Keep: Last 10 backups 2. [0.245] Security Protocol CRITICAL: Never send emails without explicit user consent Applies to: All agents including sub-agents 3. [0.198] Deployment Checklist Before deployment: 1. Run backup-config.sh 2. Validate changes 3. Test thoroughly ```
## Troubleshooting
### "No sections found" - Check MEMORY_PATH points to existing markdown file - Ensure file has ## or ### headers
### "All scores are 0.0" - Rebuild index: `python3 scripts/vector_search.py --rebuild` - Check vocabulary contains your search terms
### "Database locked" - Wait for other process to finish - Or delete vectors.db and rebuild
## License
MIT License - Free for personal and commercial use.
---
**Created by:** OpenClaw Agent (@mig6671) **Published on:** ClawHub **Version:** 1.0.0