ClawSkills logoClawSkills

Causal Inference

Add causal reasoning to agent actions. Trigger on ANY high-level action with observable outcomes - emails, messages, calendar changes, file operations, API call

Introduction

# Causal Inference

A lightweight causal layer for predicting action outcomes, not by pattern-matching correlations, but by modeling interventions and counterfactuals.

## Core Invariant

**Every action must be representable as an explicit intervention on a causal model, with predicted effects + uncertainty + a falsifiable audit trail.**

Plans must be *causally valid*, not just plausible.

## When to Trigger

**Trigger this skill on ANY high-level action**, including but not limited to:

| Domain | Actions to Log | |--------|---------------| | **Communication** | Send email, send message, reply, follow-up, notification, mention | | **Calendar** | Create/move/cancel meeting, set reminder, RSVP | | **Tasks** | Create/complete/defer task, set priority, assign | | **Files** | Create/edit/share document, commit code, deploy | | **Social** | Post, react, comment, share, DM | | **Purchases** | Order, subscribe, cancel, refund | | **System** | Config change, permission grant, integration setup |

Also trigger when: - **Reviewing outcomes** — "Did that email get a reply?" → log outcome, update estimates - **Debugging failures** — "Why didn't this work?" → trace causal graph - **Backfilling history** — "Analyze my past emails/calendar" → parse logs, reconstruct actions - **Planning** — "Should I send now or later?" → query causal model

## Backfill: Bootstrap from Historical Data

Don't start from zero. Parse existing logs to reconstruct past actions + outcomes.

### Email Backfill

```bash # Extract sent emails with reply status gog gmail list --sent --after 2024-01-01 --format json > /tmp/sent_emails.json

# For each sent email, check if reply exists python3 scripts/backfill_email.py /tmp/sent_emails.json ```

### Calendar Backfill

```bash # Extract past events with attendance gog calendar list --after 2024-01-01 --format json > /tmp/events.json

# Reconstruct: did meeting happen? was it moved? attendee count? python3 scripts/backfill_calendar.py /tmp/events.json ```

### Message Backfill (WhatsApp/Discord/Slack)

```bash # Parse message history for send/reply patterns wacli search --after 2024-01-01 --from me --format json > /tmp/wa_sent.json python3 scripts/backfill_messages.py /tmp/wa_sent.json ```

### Generic Backfill Pattern

```python # For any historical data source: for record in historical_data: action_event = { "action": infer_action_type(record), "context": extract_context(record), "time": record["timestamp"], "pre_state": reconstruct_pre_state(record), "post_state": extract_post_state(record), "outcome": determine_outcome(record), "backfilled": True # Mark as reconstructed } append_to_log(action_event) ```

## Architecture

### A. Action Log (required)

Every executed action emits a structured event:

```json { "action": "send_followup", "domain": "email", "context": {"recipient_type": "warm_lead", "prior_touches": 2}, "time": "2025-01-26T10:00:00Z", "pre_state": {"days_since_last_contact": 7}, "post_state": {"reply_received": true, "reply_delay_hours": 4}, "outcome": "positive_reply", "outcome_observed_at": "2025-01-26T14:00:00Z", "backfilled": false } ```

Store in `memory/causal/action_log.jsonl`.

### B. Causal Graphs (per domain)

Start with 10-30 observable variables per domain.

**Email domain:** ``` send_time → reply_prob subject_style → open_rate recipient_type → reply_prob followup_count → reply_prob (diminishing) time_since_last → reply_prob ```

**Calendar domain:** ``` meeting_time → attendance_rate attendee_count → slip_risk conflict_degree → reschedule_prob buffer_time → focus_quality ```

**Messaging domain:** ``` response_delay → conversation_continuation message_length → response_length time_of_day → response_prob platform → response_delay ```

**Task domain:** ``` due_date_proximity → completion_prob priority_level → completion_speed task_size → deferral_risk context_switches → error_rate ```

Store graph definitions in `memory/causal/graphs/`.

### C. Estimation

For each "knob" (intervention variable), estimate treatment effects:

```python # Pseudo: effect of morning vs evening sends effect = mean(reply_prob | send_time=morning) - mean(reply_prob | send_time=evening) uncertainty = std_error(effect) ```

Use simple regression or propensity matching first. Graduate to do-calculus when graphs are explicit and identification is needed.

### D. Decision Policy

Before executing actions:

1. Identify intervention variable(s) 2. Query causal model for expected outcome distribution 3. Compute expected utility + uncertainty bounds 4. If uncertainty > threshold OR expected harm > threshold → refuse or escalate to user 5. Log prediction for later validation

## Workflow

### On Every Action

``` BEFORE executing: 1. Log pre_state 2. If enough historical data: query model for expected outcome 3. If high uncertainty or risk: confirm with user

AFTER executing: 1. Log action + context + time 2. Set reminder to check outcome (if not immediate)

WHEN outcome observed: 1. Update action log with post_state + outcome 2. Re-estimate treatment effects if enough new data ```

### Planning an Action

``` 1. User request → identify candidate actions 2. For each action: a. Map to intervention(s) on causal graph b. Predict P(outcome | do(action)) c. Estimate uncertainty d. Compute expected utility 3. Rank by expected utility, filter by safety 4. Execute best action, log prediction 5. Observe outcome, update model ```

### Debugging a Failure

``` 1. Identify failed outcome 2. Trace back through causal graph 3. For each upstream node: a. Was the value as expected? b. Did the causal link hold? 4. Identify broken link(s) 5. Compute minimal intervention set that would have prevented failure 6. Log counterfactual for learning ```

## Quick Start: Bootstrap Today

```bash # 1. Create the infrastructure mkdir -p memory/causal/graphs memory/causal/estimates

# 2. Initialize config cat > memory/causal/config.yaml << 'EOF' domains: - email - calendar - messaging - tasks

thresholds: max_uncertainty: 0.3 min_expected_utility: 0.1

protected_actions: - delete_email - cancel_meeting - send_to_new_contact - financial_transaction EOF

# 3. Backfill one domain (start with email) python3 scripts/backfill_email.py

# 4. Estimate initial effects python3 scripts/estimate_effect.py --treatment send_time --outcome reply_received --values morning,evening ```

## Safety Constraints

Define "protected variables" that require explicit user approval:

```yaml protected: - delete_email - cancel_meeting - send_to_new_contact - financial_transaction

thresholds: max_uncertainty: 0.3 # don't act if P(outcome) uncertainty > 30% min_expected_utility: 0.1 # don't act if expected gain < 10% ```

## Files

- `memory/causal/action_log.jsonl` — all logged actions with outcomes - `memory/causal/graphs/` — domain-specific causal graph definitions - `memory/causal/estimates/` — learned treatment effects - `memory/causal/config.yaml` — safety thresholds and protected variables

## References

- See `references/do-calculus.md` for formal intervention semantics - See `references/estimation.md` for treatment effect estimation methods

More Products