Introduction
# Prompt Assemble
## Overview
A standardized, token-safe prompt assembly framework that guarantees API stability. Implements **Two-Phase Context Construction** and **Memory Safety Valve** to prevent token overflow while maximizing relevant context.
**Design Goals:** - ✅ Never fail due to memory-related token overflow - ✅ Memory is always discardable enhancement, never rigid dependency - ✅ Token budget decisions centralized at prompt assemble layer
## When to Use
Use this skill when: 1. Building or modifying any agent that constructs prompts 2. Implementing memory retrieval systems 3. Adding new prompt-related logic to existing agents 4. Any scenario where token budget safety is required
## Core Workflow
``` User Input ↓ Need-Memory Decision ↓ Minimal Context Build ↓ Memory Retrieval (Optional) ↓ Memory Summarization ↓ Token Estimation ↓ Safety Valve Decision ↓ Final Prompt → LLM Call ```
## Phase Details
### Phase 0: Base Configuration ```python # Model Context Windows (2026-02-04) # - MiniMax-M2.1: 204,000 tokens (default) # - Claude 3.5 Sonnet: 200,000 tokens # - GPT-4o: 128,000 tokens
MAX_TOKENS = 204000 # Set to your model's context limit SAFETY_MARGIN = 0.75 * MAX_TOKENS # Conservative: 75% threshold = 153,000 tokens MEMORY_TOP_K = 3 # Max 3 memories MEMORY_SUMMARY_MAX = 3 lines # Max 3 lines per memory ```
**Design Philosophy**: - Leave 25% buffer for safety (model overhead, estimation errors, spikes) - Better to underutilize capacity than to overflow
### Phase 1: Minimal Context - System prompt - Recent N messages (N=3, trimmed) - Current user input - **No memory by default**
### Phase 2: Memory Need Decision ```python def need_memory(user_input): triggers = [ "previously", "earlier we discussed", "do you remember", "as I mentioned before", "continuing from", "before we", "last time", "previously mentioned" ] for trigger in triggers: if trigger.lower() in user_input.lower(): return True return False ```
### Phase 3: Memory Retrieval (Optional) ```python memories = memory_search(query=user_input, top_k=MEMORY_TOP_K) for mem in memories: summarized_memories.append(summarize(mem, max_lines=MEMORY_SUMMARY_MAX)) ```
### Phase 4: Token Estimation Calculate estimated tokens for base_context + summarized_memories.
### Phase 5: Safety Valve (Critical) ```python if estimated_tokens > SAFETY_MARGIN: base_context.append("[System Notice] Relevant memory skipped due to token budget.") return assemble(base_context) ```
**Hard Rules:** - ❌ Never downgrade system prompt - ❌ Never truncate user input - ❌ No "lucky splicing" - ✅ Only memory layer is expendable
### Phase 6: Final Assembly ```python final_prompt = assemble(base_context + summarized_memories) return final_prompt ```
## Memory Data Standards
### Allowed in Long-Term Memory - ✅ User preferences / identity / long-term goals - ✅ Confirmed important conclusions - ✅ System-level settings and rules
### Forbidden in Long-Term Memory - ❌ Raw conversation logs - ❌ Reasoning traces - ❌ Temporary discussions - ❌ Information recoverable from chat history
## Quick Start
Copy `scripts/prompt_assemble.py` to your agent and use:
```python from prompt_assemble import build_prompt
# In your agent's prompt construction: final_prompt = build_prompt(user_input, memory_search_fn, get_recent_dialog_fn) ```
## Resources
### scripts/ - `prompt_assemble.py` - Complete implementation with all phases (PromptAssembler class)
### references/ - `memory_standards.md` - Detailed memory content guidelines - `token_estimation.md` - Token counting strategies