Cron Retry

Introduction

# Cron Retry Skill

Automatically detect and retry cron jobs that failed due to network/connection errors.

## Quick Start (Heartbeat Integration)

Add this to your `HEARTBEAT.md`:

```markdown ## Cron Recovery Check Check for cron jobs with lastStatus: "error". If the error matches network patterns (connection error, sendMessage failed, fetch failed, ETIMEDOUT, ECONNREFUSED), retry the job using cron tool with action: "run" and the job ID. Report what was recovered. ```

That's it. On each heartbeat, failed network jobs get retried automatically.

## How It Works

1. On heartbeat, check all cron jobs via `cron list` 2. Filter for jobs where `lastStatus = "error"` and `enabled = true` 3. Check if `lastError` matches network-related patterns 4. Re-run eligible jobs via `cron run` 5. Report results

## Network Error Patterns (Retryable)

These errors indicate transient network issues worth retrying:

- `Network request.*failed` - `Connection error` - `ECONNREFUSED` - `ETIMEDOUT` - `ENOTFOUND` - `sendMessage.*failed` - `fetch failed` - `socket hang up`

## What Gets Retried vs Skipped

**Retried:** - Network timeouts - Connection refused - Message send failures - DNS lookup failures

**Skipped (not retried):** - Logic errors (bad config, missing data) - Auth failures - Disabled jobs - Jobs that just ran successfully

## Manual Recovery Check

To check and retry failed jobs manually:

```bash # List all jobs and their status clawdbot cron list

# Find failed jobs clawdbot cron list | jq '.jobs[] | select(.state.lastStatus == "error") | {name, error: .state.lastError}'

# Retry a specific job clawdbot cron run --id <JOB_ID> ```

## Agent Implementation

When implementing the heartbeat check:

``` 1. Call cron tool with action: "list" 2. For each job in response.jobs: - Skip if job.enabled !== true - Skip if job.state.lastStatus !== "error" - Check if job.state.lastError matches network patterns - If retryable: call cron tool with action: "run", jobId: job.id 3. Report: "Recovered X jobs" or "No failed jobs to recover" ```

## Example Scenario

1. **7:00 PM** — Evening briefing cron fires 2. **Network hiccup** — Telegram send fails 3. **Job marked** `lastStatus: "error"`, `lastError: "Network request for 'sendMessage' failed!"` 4. **7:15 PM** — Connection restored, heartbeat runs 5. **Skill detects** the failed job, sees it's a network error 6. **Retries** the job → briefing delivered 7. **Reports**: "Recovered 1 job: evening-wrap-briefing"

## Safety

- Only retries transient network errors - Respects job enabled state - Won't create retry loops (checks lastRunAtMs) - Reports all recovery attempts

Back

Introduction

More Products

self-improving-agent

Find Skills

Sonoscli