Elevenlabs Tts

介绍

# ElevenLabs TTS (Text-to-Speech)

使用 ElevenLabs v3 和音频标签生成富有表现力的语音消息。

## Prerequisites

- **ElevenLabs API Key** (`ELEVENLABS_API_KEY`)：必需。在 [elevenlabs.io](https://elevenlabs.io) → Profile → API Keys 获取。在 `openclaw.json` 中的 `messages.tts.elevenlabs.apiKey` 下进行配置。 - **ffmpeg**：音频格式转换所需（MP3 → Opus 以兼容 WhatsApp）。必须安装并在 PATH 中可用。

## Quick Start Examples

**Storytelling (emotional journey):** ``` [soft] It started like any other day... [pause] But something felt different. [nervous] My hands were shaking as I opened the envelope. [gasps] I got in! [excited] I actually got in! [laughs] [happy] This changes everything! ```

**Horror/Suspense (building dread):** ``` [whispers] The house has been empty for years... [pause] At least, that's what they told me. [nervous] But I keep hearing footsteps. [scared] They're getting closer. [gasps] [panicking] The door— it's opening by itself! ```

**Conversation with reactions:** ``` [curious] So what happened at the meeting? [pause] [surprised] Wait, they fired him?! [gasps] [sad] That's terrible... [sighs] He had a family. [thoughtful] I wonder what he'll do now. ```

**Hebrew (romantic moment):** ``` [soft] היא עמדה שם, מול השקיעה... [pause] הלב שלי פעם כל כך חזק. [nervous] לא ידעתי מה להגיד. [hesitates] אני... [breathes] [tender] את יודעת שאני אוהב אותך, נכון? ```

**Spanish (celebration to reflection):** ``` [excited] ¡Lo logramos! [laughs] [happy] No puedo creerlo... [pause] [thoughtful] Fueron tantos años de trabajo. [emotional] [soft] Gracias a todos los que creyeron en mí. [sighs] [content] Valió la pena cada momento. ```

## Configuration (OpenClaw)

在 `openclaw.json` 中，在 `messages.tts` 下配置 TTS：

```json { "messages": { "tts": { "provider": "elevenlabs", "elevenlabs": { "apiKey": "sk_your_api_key_here", "voiceId": "pNInz6obpgDQGcFmaJgB", "modelId": "eleven_v3", "languageCode": "en", "voiceSettings": { "stability": 0.5, "similarityBoost": 0.75, "style": 0, "useSpeakerBoost": true, "speed": 1 } } } } } ```

**Getting your API Key:** 1. 访问 https://elevenlabs.io 2. 注册/登录 3. 点击个人资料 → API Keys 4. 复制您的密钥

## Recommended Voices for v3

这些预制声音针对 v3 进行了优化，并且与音频标签配合良好：

| Voice | ID | Gender | Accent | Best For | |-------|-----|--------|--------|----------| | **Adam** | `pNInz6obpgDQGcFmaJgB` | Male | American | Deep narration, general use | | **Rachel** | `21m00Tcm4TlvDq8ikWAM` | Female | American | Calm narration, conversational | | **Brian** | `nPczCjzI2devNBz1zQrb` | Male | American | Deep narration, podcasts | | **Charlotte** | `XB0fDUnXU5powFXDhCwa` | Female | English-Swedish | Expressive, video games | | **George** | `JBFqnCBsd6RMkjVDRZzb` | Male | British | Raspy narration, storytelling |

**Finding more voices:** - 浏览：https://elevenlabs.io/voice-library - v3 优化合集：https://elevenlabs.io/app/voice-library/collections/aF6JALq9R6tXwCczjhKH - API：`GET https://api.elevenlabs.io/v1/voices`

**Voice selection tips:** - 使用 IVC (Instant Voice Clone) 或预制声音 - PVC 尚未针对 v3 优化 - 将声音角色与您的用例匹配（耳语的声音无法很好地喊叫） - 对于富有表现力的 IVC，在训练样本中包含多样的情感基调

## Model Settings

- **Model**: `eleven_v3` (alpha) - 唯一支持音频标签的模型 - **Languages**: 支持 70+ 种语言，具有完整的音频标签控制

### Stability Modes

| Mode | Stability | Description | |------|-----------|-------------| | **Creative** | 0.3-0.5 | More emotional/expressive, may hallucinate | | **Natural** | 0.5-0.7 | Balanced, closest to original voice | | **Robust** | 0.7-1.0 | Highly stable, less responsive to tags |

对于音频标签，请使用 **Creative** (0.5) 或 **Natural**。更高的稳定性会降低标签响应度。

### Speed Control

范围：0.7 (慢) 到 1.2 (快)，默认 1.0

极端值会影响质量。对于节奏控制，首选音频标签，如 `[rushed]` 或 `[drawn out]`。

## Critical Rules

### Length Limits - **Optimal**: 每段 <800 字符（最佳质量） - **Maximum**: 10,000 字符（API 硬限制） - 文本越长，**Quality degrades** - 声音变得不一致

### Audio Tags - Best Practices for Natural Sound

**How many tags to use:** - 每个句子或短语 1-2 个标签（不要更多！） - 标签会持续到下一个标签 - 无需重复 - 过度使用标签听起来不自然且像机器人

**Where to place tags:** - 在情感转换点 - 在关键戏剧性时刻之前 - 当能量/节奏发生变化时

**Context matters:** - 编写与标签情感*匹配*的文本 - 带有上下文的长文本 = 更好的解释 - 示例：`[nervous] I... I'm not sure about this. What if it doesn't work?` 比 `[nervous] Hello.` 效果更好

**Combine tags for nuance:** - `[nervously][whispers]` = nervous whispering - `[excited][laughs]` = excited laughter - 组合最多保留 2 个标签

**Regenerate for best results:** - v3 是非确定性的 - 相同文本 = 不同输出 - 生成 3 个以上版本，选择最好的 - 微小的文本调整可以改善结果

**Match tag to voice:** - 不要在耳语声音上使用 `[shouts]` - 不要在大声/充满活力的声音上使用 `[whispers]` - 用您选择的声音测试标签

### SSML Not Supported v3 不支持 SSML break 标签。请改用音频标签和标点符号。

### Punctuation Effects (use with tags!)

标点符号增强音频标签： - **Ellipses (...)** → 戏剧性停顿：`[nervous] I... I don't know...` - **CAPS** → 强调：`[excited] That's AMAZING!` - **Dashes (—)** → 打断：`[explaining] So what you do is— [interrupting] Wait!` - **Question marks** → 不确定：`[nervous] Are you sure about this?` - **Exclamation!** → 能量提升：`[happy] We did it!`

结合标签 + 标点符号以获得最大效果： ``` [tired] It was a long day... [sighs] Nobody listens anymore. ```

## WhatsApp Voice Messages

### Complete Workflow

1. **Generate** 使用 `tts` 工具（返回 MP3） 2. **Convert** 转换为 Opus（Android 必需！） 3. **Send** 使用 `message` 工具发送

### Step-by-Step

**1. Generate TTS (add [pause] at end to prevent cutoff):** ``` tts text="[excited] This is amazing! [pause]" channel=whatsapp ``` 返回：`MEDIA:/tmp/tts-xxx/voice-123.mp3`

**2. Convert MP3 → Opus:** ```bash ffmpeg -i /tmp/tts-xxx/voice-123.mp3 -c:a libopus -b:a 64k -vbr on -application voip /tmp/tts-xxx/voice-123.ogg ```

**3. Send the Opus file:**

> **Note:** 下面的 `message` 字段在引号之间包含一个 Unicode 从左到右标记 (U+200E)。 > 这是故意的 — WhatsApp 需要非空的消息正文来发送语音笔记。 > LTR 标记是不可见的，但满足此要求而不显示任何文本。

``` message action=send channel=whatsapp target="+972..." filePath="/tmp/tts-xxx/voice-123.ogg" asVoice=true message="‎" ```

### Why Opus?

| Format | iOS | Android | Transcribe | |--------|-----|---------|------------| | MP3 | ✅ Works | ❌ May fail | ❌ No | | Opus (.ogg) | ✅ Works | ✅ Works | ✅ Yes |

**Always convert to Opus** - 这是唯一支持以下功能的格式： - 在所有设备上运行（iOS + Android） - 支持 WhatsApp 的转录按钮

### Audio Cutoff Fix

ElevenLabs 有时会截断最后一个词。**Always add `[pause]` or `...` at the end:** ``` [excited] This is amazing! [pause] ```

## Long-Form Audio (Podcasts)

对于超过 800 字符的内容：

1. 拆分为短片段（每个 <800 字符） 2. 使用 `tts` 工具生成每个片段 3. 使用 ffmpeg 拼接： ```bash cat > list.txt << EOF file '/path/file1.mp3' file '/path/file2.mp3' EOF ffmpeg -f concat -safe 0 -i list.txt -c copy final.mp3 ``` 4. 转换为 Opus 以便用于 WhatsApp 5. 作为单条语音消息发送

**Important**：不要提及“part 2”或“chapter”——保持无缝衔接。

## Multi-Speaker Dialogue

v3 可以在一次生成中处理多个角色：

``` Jessica: [whispers] Did you hear that? Chris: [interrupting] —I heard it too! Jessica: [panicking] We need to hide! ```

**Dialogue tags**: `[interrupting]`, `[overlapping]`, `[cuts in]`, `[interjecting]`

## Audio Tags Quick Reference

| Category | Tags | When to Use | |----------|------|-------------| | **Emotions** | [excited], [happy], [sad], [angry], [nervous], [curious] | Main emotional state - use 1 per section | | **Delivery** | [whispers], [shouts], [soft], [rushed], [drawn out] | Volume/speed changes | | **Reactions** | [laughs], [sighs], [gasps], [clears throat], [gulps] | Natural human moments - sprinkle sparingly | | **Pacing** | [pause], [hesitates], [stammers], [breathes] | Dramatic timing | | **Character** | [French accent], [British accent], [robotic tone] | Character voice shifts | | **Dialogue** | [interrupting], [overlapping], [cuts in] | Multi-speaker conversations |

**Most effective tags** (reliable results): - Emotions: `[excited]`, `[nervous]`, `[sad]`, `[happy]` - Reactions: `[laughs]`, `[sighs]`, `[whispers]` - Pacing: `[pause]`

**Less reliable** (test and regenerate): - Sound effects: `[explosion]`, `[gunshot]` - Accents: results vary by voice

**Full tag list**: 参见 [references/audio-tags.md](references/audio-tags.md)

## Troubleshooting

**Tags read aloud?** - 验证使用 `eleven_v3` 模型 - 使用 IVC/预制声音，不要使用 PVC - 简化标签（不要加“tone”后缀） - 增加文本长度（250+ 字符）

**Voice inconsistent?** - 片段太长 - 在 <800 字符处拆分 - 重新生成（v3 是非确定性的） - 尝试更低的稳定性设置

**WhatsApp won't play?** - 转换为 Opus 格式（见上文）

**No emotion despite tags?** - 声音可能与标签风格不匹配 - 尝试 Creative 稳定性模式 (0.5) - 在标签周围添加更多上下文

介绍

更多产品

Slack

AgentMail

Telegram