LLM Supervisor

Introduction

# LLM Supervisor 🔮

Handles rate limits and model fallbacks gracefully.

## Behavior

### On Rate Limit / Overload Errors

When I encounter rate limits or overload errors from cloud providers (Anthropic, OpenAI):

1. **Tell the user immediately** — Don't silently fail or retry endlessly 2. **Offer local fallback** — Ask if they want to switch to Ollama 3. **Wait for confirmation** — Never auto-switch for code generation tasks

### Confirmation Required

Before using local models for code generation, ask: > "Cloud is rate-limited. Switch to local Ollama (`qwen2.5:7b`)? Reply 'yes' to confirm."

For simple queries (chat, summaries), can switch without confirmation if user previously approved.

## Commands

### `/llm status` Report current state: - Which provider is active (cloud/local) - Ollama availability and models - Recent rate limit events

### `/llm switch local` Manually switch to Ollama for the session.

### `/llm switch cloud` Switch back to cloud provider.

## Using Ollama

```bash # Check available models ollama list

# Run a query ollama run qwen2.5:7b "your prompt here"

# For longer prompts, use stdin echo "your prompt" | ollama run qwen2.5:7b ```

## Installed Models

Check with `ollama list`. Configured default: `qwen2.5:7b`

## State Tracking

Track in memory during session: - `currentProvider`: "cloud" | "local" - `lastRateLimitAt`: timestamp or null - `localConfirmedForCode`: boolean

Reset to cloud at session start.

Back

Introduction

More Products

self-improving-agent

Find Skills

Sonoscli