Prompt Safe

Introduction

# Prompt Assemble

## Overview

A standardized, token-safe prompt assembly framework that guarantees API stability. Implements **Two-Phase Context Construction** and **Memory Safety Valve** to prevent token overflow while maximizing relevant context.

**Design Goals:** - ✅ Never fail due to memory-related token overflow - ✅ Memory is always discardable enhancement, never rigid dependency - ✅ Token budget decisions centralized at prompt assemble layer

## When to Use

Use this skill when: 1. Building or modifying any agent that constructs prompts 2. Implementing memory retrieval systems 3. Adding new prompt-related logic to existing agents 4. Any scenario where token budget safety is required

## Core Workflow

``` User Input ↓ Need-Memory Decision ↓ Minimal Context Build ↓ Memory Retrieval (Optional) ↓ Memory Summarization ↓ Token Estimation ↓ Safety Valve Decision ↓ Final Prompt → LLM Call ```

## Phase Details

### Phase 0: Base Configuration ```python # Model Context Windows (2026-02-04) # - MiniMax-M2.1: 204,000 tokens (default) # - Claude 3.5 Sonnet: 200,000 tokens # - GPT-4o: 128,000 tokens

MAX_TOKENS = 204000 # Set to your model's context limit SAFETY_MARGIN = 0.75 * MAX_TOKENS # Conservative: 75% threshold = 153,000 tokens MEMORY_TOP_K = 3 # Max 3 memories MEMORY_SUMMARY_MAX = 3 lines # Max 3 lines per memory ```

**Design Philosophy**: - Leave 25% buffer for safety (model overhead, estimation errors, spikes) - Better to underutilize capacity than to overflow

### Phase 1: Minimal Context - System prompt - Recent N messages (N=3, trimmed) - Current user input - **No memory by default**

### Phase 2: Memory Need Decision ```python def need_memory(user_input): triggers = [ "previously", "earlier we discussed", "do you remember", "as I mentioned before", "continuing from", "before we", "last time", "previously mentioned" ] for trigger in triggers: if trigger.lower() in user_input.lower(): return True return False ```

### Phase 3: Memory Retrieval (Optional) ```python memories = memory_search(query=user_input, top_k=MEMORY_TOP_K) for mem in memories: summarized_memories.append(summarize(mem, max_lines=MEMORY_SUMMARY_MAX)) ```

### Phase 4: Token Estimation Calculate estimated tokens for base_context + summarized_memories.

### Phase 5: Safety Valve (Critical) ```python if estimated_tokens > SAFETY_MARGIN: base_context.append("[System Notice] Relevant memory skipped due to token budget.") return assemble(base_context) ```

**Hard Rules:** - ❌ Never downgrade system prompt - ❌ Never truncate user input - ❌ No "lucky splicing" - ✅ Only memory layer is expendable

### Phase 6: Final Assembly ```python final_prompt = assemble(base_context + summarized_memories) return final_prompt ```

## Memory Data Standards

### Allowed in Long-Term Memory - ✅ User preferences / identity / long-term goals - ✅ Confirmed important conclusions - ✅ System-level settings and rules

### Forbidden in Long-Term Memory - ❌ Raw conversation logs - ❌ Reasoning traces - ❌ Temporary discussions - ❌ Information recoverable from chat history

## Quick Start

Copy `scripts/prompt_assemble.py` to your agent and use:

```python from prompt_assemble import build_prompt

# In your agent's prompt construction: final_prompt = build_prompt(user_input, memory_search_fn, get_recent_dialog_fn) ```

## Resources

### scripts/ - `prompt_assemble.py` - Complete implementation with all phases (PromptAssembler class)

### references/ - `memory_standards.md` - Detailed memory content guidelines - `token_estimation.md` - Token counting strategies

Back

Introduction

More Products

Nano Banana Pro

Gemini

Pg Release