ElevenLabs Speech-to-Text

Introduction

# ElevenLabs Speech-to-Text

Transcribe audio files using ElevenLabs' Scribe v2 model. Supports 90+ languages with speaker diarization.

## Quick Start

```bash # Basic transcription {baseDir}/scripts/transcribe.sh /path/to/audio.mp3

# With speaker diarization {baseDir}/scripts/transcribe.sh /path/to/audio.mp3 --diarize

# Specify language (improves accuracy) {baseDir}/scripts/transcribe.sh /path/to/audio.mp3 --lang en

# Full JSON output with timestamps {baseDir}/scripts/transcribe.sh /path/to/audio.mp3 --json ```

## Options

| Flag | Description | |------|-------------| | `--diarize` | Identify different speakers | | `--lang CODE` | ISO language code (e.g., en, pt, es) | | `--json` | Output full JSON with word timestamps | | `--events` | Tag audio events (laughter, music, etc.) |

## Supported Formats

All major audio/video formats: mp3, m4a, wav, ogg, webm, mp4, etc.

## API Key

Set `ELEVENLABS_API_KEY` environment variable, or configure in clawdbot.json:

```json5 { skills: { entries: { "elevenlabs-stt": { apiKey: "sk_..." } } } } ```

## Examples

```bash # Transcribe a WhatsApp voice note {baseDir}/scripts/transcribe.sh ~/Downloads/voice_note.ogg

# Meeting recording with multiple speakers {baseDir}/scripts/transcribe.sh meeting.mp3 --diarize --lang en

# Get JSON for processing {baseDir}/scripts/transcribe.sh podcast.mp3 --json > transcript.json ```

Back

ElevenLabs Speech-to-Text

Introduction

More Products

self-improving-agent

Find Skills

Sonoscli