ClawSkills logoClawSkills

Elevenlabs Integration with Openclaw

ClawVox - ElevenLabs voice studio for OpenClaw. Generate speech, transcribe audio, clone voices, create sound effects, and more.

Introduction

# ClawVox

Transform your OpenClaw assistant into a professional voice production studio with ClawVox - powered by ElevenLabs.

## Quick Reference

| Action | Command | Description | |--------|---------|-------------| | Speak | `{baseDir}/scripts/speak.sh 'text'` | Convert text to speech | | Transcribe | `{baseDir}/scripts/transcribe.sh audio.mp3` | Speech to text | | Clone | `{baseDir}/scripts/clone.sh --name "Voice" sample.mp3` | Clone a voice | | SFX | `{baseDir}/scripts/sfx.sh "thunder storm"` | Generate sound effects | | Voices | `{baseDir}/scripts/voices.sh list` | List available voices | | Dub | `{baseDir}/scripts/dub.sh --target es audio.mp3` | Translate audio | | Isolate | `{baseDir}/scripts/isolate.sh audio.mp3` | Remove background noise |

## Setup

1. Get your API key from [elevenlabs.io/app/settings/api-keys](https://elevenlabs.io/app/settings/api-keys) 2. Configure in `~/.openclaw/openclaw.json`:

```json5 { skills: { entries: { "clawvox": { apiKey: "YOUR_ELEVENLABS_API_KEY", config: { defaultVoice: "Rachel", defaultModel: "eleven_turbo_v2_5", outputDir: "~/.openclaw/audio" } } } } } ```

Or set the environment variable: ```bash export ELEVENLABS_API_KEY="your_api_key_here" ```

## Voice Generation (TTS)

### Basic Text-to-Speech ```bash # Quick speak with default voice (Rachel) {baseDir}/scripts/speak.sh 'Hello, I am your personal AI assistant.'

# Specify voice by name {baseDir}/scripts/speak.sh --voice Adam 'Hello from Adam'

# Save to file {baseDir}/scripts/speak.sh --out ~/audio/greeting.mp3 'Welcome to the show'

# Use specific model {baseDir}/scripts/speak.sh --model eleven_multilingual_v2 'Bonjour'

# Adjust voice settings {baseDir}/scripts/speak.sh --stability 0.5 --similarity 0.8 'Expressive speech'

# Adjust speed {baseDir}/scripts/speak.sh --speed 1.2 'Faster speech'

# Use multilingual model for other languages {baseDir}/scripts/speak.sh --model eleven_multilingual_v2 --voice Rachel 'Hola, que tal' {baseDir}/scripts/speak.sh --model eleven_multilingual_v2 --voice Adam 'Guten Tag' ```

### Voice Models

| Model | Latency | Languages | Best For | |-------|---------|-----------|----------| | `eleven_flash_v2_5` | ~75ms | 32 | Real-time, streaming | | `eleven_turbo_v2_5` | ~250ms | 32 | Balanced quality/speed | | `eleven_multilingual_v2` | ~500ms | 29 | Long-form, highest quality |

### Available Voices

Premade voices: Rachel, Adam, Antoni, Bella, Domi, Elli, Josh, Sam, Callum, Charlie, George, Liam, Matilda, Alice, Bill, Brian, Chris, Daniel, Eric, Jessica, Laura, Lily, River, Roger, Sarah, Will

### Long-Form Content ```bash # Generate audio from text file {baseDir}/scripts/speak.sh --input chapter.txt --voice "George" --out audiobook.mp3 ```

## Speech-to-Text (Transcription)

### Basic Transcription ```bash # Transcribe audio file {baseDir}/scripts/transcribe.sh recording.mp3

# Save to file {baseDir}/scripts/transcribe.sh --out transcript.txt audio.mp3

# Transcribe with language hint {baseDir}/scripts/transcribe.sh --language es spanish_audio.mp3

# Include timestamps {baseDir}/scripts/transcribe.sh --timestamps podcast.mp3 ```

### Supported Formats - MP3, MP4, MPEG, MPGA, M4A, WAV, WebM - Maximum file size: 100MB

## Voice Cloning

### Instant Voice Clone ```bash # Clone from single sample (minimum 30 seconds recommended) {baseDir}/scripts/clone.sh --name MyVoice recording.mp3

# Clone with description {baseDir}/scripts/clone.sh --name BusinessVoice \ --description 'Professional male voice' \ sample.mp3

# Clone with labels {baseDir}/scripts/clone.sh --name MyVoice \ --labels '{"gender":"male","age":"adult"}' \ sample.mp3

# Remove background noise during cloning {baseDir}/scripts/clone.sh --name CleanVoice \ --remove-bg-noise \ sample.mp3

# Test cloned voice {baseDir}/scripts/speak.sh --voice MyVoice 'Testing my cloned voice' ```

## Voice Library Management

```bash # List all available voices {baseDir}/scripts/voices.sh list

# Get voice details {baseDir}/scripts/voices.sh info --name Rachel {baseDir}/scripts/voices.sh info --id 21m00Tcm4TlvDq8ikWAM

# Search voices (filter output with grep) {baseDir}/scripts/voices.sh list | grep -i "female"

# Filter by category {baseDir}/scripts/voices.sh list --category premade {baseDir}/scripts/voices.sh list --category cloned

# Download voice preview {baseDir}/scripts/voices.sh preview --name Rachel -o preview.mp3

# Delete custom voice {baseDir}/scripts/voices.sh delete --id "voice_id" ```

## Sound Effects

```bash # Generate sound effect {baseDir}/scripts/sfx.sh 'Heavy rain on a tin roof'

# With duration {baseDir}/scripts/sfx.sh --duration 5 'Forest ambiance with birds'

# With prompt influence (higher = more accurate) {baseDir}/scripts/sfx.sh --influence 0.8 'Sci-fi laser gun firing'

# Save to file {baseDir}/scripts/sfx.sh --out effects/thunder.mp3 'Rolling thunder' ```

**Note:** Duration range is 0.5 to 22 seconds (rounded to nearest 0.5)

## Voice Isolation

```bash # Remove background noise and isolate voice {baseDir}/scripts/isolate.sh noisy_recording.mp3

# Save to specific file {baseDir}/scripts/isolate.sh --out clean_voice.mp3 meeting_recording.mp3

# Don't tag audio events {baseDir}/scripts/isolate.sh --no-audio-events recording.mp3 ```

**Requirements:** - Minimum duration: 4.6 seconds - Supported formats: MP3, WAV, M4A, OGG, FLAC

## Dubbing (Multi-Language Translation)

```bash # Dub audio to Spanish {baseDir}/scripts/dub.sh --target es audio.mp3

# Dub with source language specified {baseDir}/scripts/dub.sh --source en --target ja video.mp4

# Check dubbing status {baseDir}/scripts/dub.sh --status --id "dubbing_id"

# Download dubbed audio {baseDir}/scripts/dub.sh --download --id "dubbing_id" --out dubbed.mp3 ```

**Supported languages:** en, es, fr, de, it, pt, pl, hi, ar, zh, ja, ko, nl, ru, tr, vi, sv, da, fi, cs, el, he, id, ms, no, ro, uk, hu, th

## API Usage Examples

For direct API access, all scripts use curl under the hood:

```bash # Direct TTS API call curl -X POST "https://api.elevenlabs.io/v1/text-to-speech/VOICE_ID" \ -H "xi-api-key: $ELEVENLABS_API_KEY" \ -H "Content-Type: application/json" \ -d '{"text": "Hello world", "model_id": "eleven_turbo_v2_5"}' \ --output speech.mp3 ```

## Error Handling

All scripts provide helpful error messages:

- **401**: Authentication failed - Check your API key - **403**: Permission denied - Your API key may not have access - **429**: Rate limit exceeded - Wait before trying again - **500/502/503**: ElevenLabs API issues - Try again later

## Testing

Run the test suite to verify everything works:

```bash {baseDir}/test.sh YOUR_API_KEY ```

Or with environment variable: ```bash export ELEVENLABS_API_KEY="your_key" {baseDir}/test.sh ```

## Troubleshooting

### Common Issues

1. **"exec host not allowed (requested gateway)"** - The skill needs to run commands in a sandbox environment - Configure OpenClaw to use sandbox: `tools.exec.host: "sandbox"` - Or enable sandboxing in your OpenClaw config - Alternative: Configure exec approvals for gateway host (see OpenClaw docs)

2. **Parse errors with quotes or exclamation marks** - Use single quotes instead of double quotes: `'Hello world'` not `"Hello world!"` - Avoid exclamation marks (`!`) in text when using double quotes - For complex text, use the `--input` option with a file

3. **"ELEVENLABS_API_KEY not set"** - Ensure `ELEVENLABS_API_KEY` is set or configured in openclaw.json - Check that the API key is at least 20 characters long

2. **"jq is required but not installed"** - Install jq: `apt-get install jq` (Linux) or `brew install jq` (macOS)

3. **"Rate limited"** - Check your ElevenLabs plan quota at elevenlabs.io/app/usage - Free tier: ~10,000 characters/month

4. **"Voice not found"** - Use `{baseDir}/scripts/voices.sh list` to see available voices - Check if the voice ID is correct

5. **"Dubbing failed"** - Ensure source audio is clear and audible - Check supported language codes

6. **"File too large"** - Transcription: 100MB max - Dubbing: 500MB max - Voice cloning: 50MB per file

### Debug Mode ```bash # Enable verbose output DEBUG=1 {baseDir}/scripts/speak.sh 'test'

# Show API request details DEBUG=1 {baseDir}/scripts/transcribe.sh audio.mp3 ```

## Pricing Notes

ElevenLabs API pricing (approximate): - **Flash v2.5**: ~$0.06/min - **Turbo v2.5**: ~$0.06/min - **Multilingual v2**: ~$0.12/min - **Voice cloning**: Included in plan - **Sound effects**: ~$0.02/generation - **Transcription**: ~$0.02/min (Scribe v1)

Free tier: ~10,000 characters/month

## Links

- [ElevenLabs Dashboard](https://elevenlabs.io/app) - [API Documentation](https://elevenlabs.io/docs) - [Voice Library](https://elevenlabs.io/voice-library) - [Pricing](https://elevenlabs.io/pricing)

More Products