介绍
# ElevenLabs TTS (Text-to-Speech)
使用 ElevenLabs v3 和音频标签生成富有表现力的语音消息。
## Prerequisites
- **ElevenLabs API Key** (`ELEVENLABS_API_KEY`):必需。在 [elevenlabs.io](https://elevenlabs.io) → Profile → API Keys 获取。在 `openclaw.json` 中的 `messages.tts.elevenlabs.apiKey` 下进行配置。 - **ffmpeg**:音频格式转换所需(MP3 → Opus 以兼容 WhatsApp)。必须安装并在 PATH 中可用。
## Quick Start Examples
**Storytelling (emotional journey):** ``` [soft] It started like any other day... [pause] But something felt different. [nervous] My hands were shaking as I opened the envelope. [gasps] I got in! [excited] I actually got in! [laughs] [happy] This changes everything! ```
**Horror/Suspense (building dread):** ``` [whispers] The house has been empty for years... [pause] At least, that's what they told me. [nervous] But I keep hearing footsteps. [scared] They're getting closer. [gasps] [panicking] The door— it's opening by itself! ```
**Conversation with reactions:** ``` [curious] So what happened at the meeting? [pause] [surprised] Wait, they fired him?! [gasps] [sad] That's terrible... [sighs] He had a family. [thoughtful] I wonder what he'll do now. ```
**Hebrew (romantic moment):** ``` [soft] היא עמדה שם, מול השקיעה... [pause] הלב שלי פעם כל כך חזק. [nervous] לא ידעתי מה להגיד. [hesitates] אני... [breathes] [tender] את יודעת שאני אוהב אותך, נכון? ```
**Spanish (celebration to reflection):** ``` [excited] ¡Lo logramos! [laughs] [happy] No puedo creerlo... [pause] [thoughtful] Fueron tantos años de trabajo. [emotional] [soft] Gracias a todos los que creyeron en mí. [sighs] [content] Valió la pena cada momento. ```
## Configuration (OpenClaw)
在 `openclaw.json` 中,在 `messages.tts` 下配置 TTS:
```json { "messages": { "tts": { "provider": "elevenlabs", "elevenlabs": { "apiKey": "sk_your_api_key_here", "voiceId": "pNInz6obpgDQGcFmaJgB", "modelId": "eleven_v3", "languageCode": "en", "voiceSettings": { "stability": 0.5, "similarityBoost": 0.75, "style": 0, "useSpeakerBoost": true, "speed": 1 } } } } } ```
**Getting your API Key:** 1. 访问 https://elevenlabs.io 2. 注册/登录 3. 点击个人资料 → API Keys 4. 复制您的密钥
## Recommended Voices for v3
这些预制声音针对 v3 进行了优化,并且与音频标签配合良好:
| Voice | ID | Gender | Accent | Best For | |-------|-----|--------|--------|----------| | **Adam** | `pNInz6obpgDQGcFmaJgB` | Male | American | Deep narration, general use | | **Rachel** | `21m00Tcm4TlvDq8ikWAM` | Female | American | Calm narration, conversational | | **Brian** | `nPczCjzI2devNBz1zQrb` | Male | American | Deep narration, podcasts | | **Charlotte** | `XB0fDUnXU5powFXDhCwa` | Female | English-Swedish | Expressive, video games | | **George** | `JBFqnCBsd6RMkjVDRZzb` | Male | British | Raspy narration, storytelling |
**Finding more voices:** - 浏览:https://elevenlabs.io/voice-library - v3 优化合集:https://elevenlabs.io/app/voice-library/collections/aF6JALq9R6tXwCczjhKH - API:`GET https://api.elevenlabs.io/v1/voices`
**Voice selection tips:** - 使用 IVC (Instant Voice Clone) 或预制声音 - PVC 尚未针对 v3 优化 - 将声音角色与您的用例匹配(耳语的声音无法很好地喊叫) - 对于富有表现力的 IVC,在训练样本中包含多样的情感基调
## Model Settings
- **Model**: `eleven_v3` (alpha) - 唯一支持音频标签的模型 - **Languages**: 支持 70+ 种语言,具有完整的音频标签控制
### Stability Modes
| Mode | Stability | Description | |------|-----------|-------------| | **Creative** | 0.3-0.5 | More emotional/expressive, may hallucinate | | **Natural** | 0.5-0.7 | Balanced, closest to original voice | | **Robust** | 0.7-1.0 | Highly stable, less responsive to tags |
对于音频标签,请使用 **Creative** (0.5) 或 **Natural**。更高的稳定性会降低标签响应度。
### Speed Control
范围:0.7 (慢) 到 1.2 (快),默认 1.0
极端值会影响质量。对于节奏控制,首选音频标签,如 `[rushed]` 或 `[drawn out]`。
## Critical Rules
### Length Limits - **Optimal**: 每段 <800 字符(最佳质量) - **Maximum**: 10,000 字符(API 硬限制) - 文本越长,**Quality degrades** - 声音变得不一致
### Audio Tags - Best Practices for Natural Sound
**How many tags to use:** - 每个句子或短语 1-2 个标签(不要更多!) - 标签会持续到下一个标签 - 无需重复 - 过度使用标签听起来不自然且像机器人
**Where to place tags:** - 在情感转换点 - 在关键戏剧性时刻之前 - 当能量/节奏发生变化时
**Context matters:** - 编写与标签情感*匹配*的文本 - 带有上下文的长文本 = 更好的解释 - 示例:`[nervous] I... I'm not sure about this. What if it doesn't work?` 比 `[nervous] Hello.` 效果更好
**Combine tags for nuance:** - `[nervously][whispers]` = nervous whispering - `[excited][laughs]` = excited laughter - 组合最多保留 2 个标签
**Regenerate for best results:** - v3 是非确定性的 - 相同文本 = 不同输出 - 生成 3 个以上版本,选择最好的 - 微小的文本调整可以改善结果
**Match tag to voice:** - 不要在耳语声音上使用 `[shouts]` - 不要在大声/充满活力的声音上使用 `[whispers]` - 用您选择的声音测试标签
### SSML Not Supported v3 不支持 SSML break 标签。请改用音频标签和标点符号。
### Punctuation Effects (use with tags!)
标点符号增强音频标签: - **Ellipses (...)** → 戏剧性停顿:`[nervous] I... I don't know...` - **CAPS** → 强调:`[excited] That's AMAZING!` - **Dashes (—)** → 打断:`[explaining] So what you do is— [interrupting] Wait!` - **Question marks** → 不确定:`[nervous] Are you sure about this?` - **Exclamation!** → 能量提升:`[happy] We did it!`
结合标签 + 标点符号以获得最大效果: ``` [tired] It was a long day... [sighs] Nobody listens anymore. ```
## WhatsApp Voice Messages
### Complete Workflow
1. **Generate** 使用 `tts` 工具(返回 MP3) 2. **Convert** 转换为 Opus(Android 必需!) 3. **Send** 使用 `message` 工具发送
### Step-by-Step
**1. Generate TTS (add [pause] at end to prevent cutoff):** ``` tts text="[excited] This is amazing! [pause]" channel=whatsapp ``` 返回:`MEDIA:/tmp/tts-xxx/voice-123.mp3`
**2. Convert MP3 → Opus:** ```bash ffmpeg -i /tmp/tts-xxx/voice-123.mp3 -c:a libopus -b:a 64k -vbr on -application voip /tmp/tts-xxx/voice-123.ogg ```
**3. Send the Opus file:**
> **Note:** 下面的 `message` 字段在引号之间包含一个 Unicode 从左到右标记 (U+200E)。 > 这是故意的 — WhatsApp 需要非空的消息正文来发送语音笔记。 > LTR 标记是不可见的,但满足此要求而不显示任何文本。
``` message action=send channel=whatsapp target="+972..." filePath="/tmp/tts-xxx/voice-123.ogg" asVoice=true message="" ```
### Why Opus?
| Format | iOS | Android | Transcribe | |--------|-----|---------|------------| | MP3 | ✅ Works | ❌ May fail | ❌ No | | Opus (.ogg) | ✅ Works | ✅ Works | ✅ Yes |
**Always convert to Opus** - 这是唯一支持以下功能的格式: - 在所有设备上运行(iOS + Android) - 支持 WhatsApp 的转录按钮
### Audio Cutoff Fix
ElevenLabs 有时会截断最后一个词。**Always add `[pause]` or `...` at the end:** ``` [excited] This is amazing! [pause] ```
## Long-Form Audio (Podcasts)
对于超过 800 字符的内容:
1. 拆分为短片段(每个 <800 字符) 2. 使用 `tts` 工具生成每个片段 3. 使用 ffmpeg 拼接: ```bash cat > list.txt << EOF file '/path/file1.mp3' file '/path/file2.mp3' EOF ffmpeg -f concat -safe 0 -i list.txt -c copy final.mp3 ``` 4. 转换为 Opus 以便用于 WhatsApp 5. 作为单条语音消息发送
**Important**:不要提及“part 2”或“chapter”——保持无缝衔接。
## Multi-Speaker Dialogue
v3 可以在一次生成中处理多个角色:
``` Jessica: [whispers] Did you hear that? Chris: [interrupting] —I heard it too! Jessica: [panicking] We need to hide! ```
**Dialogue tags**: `[interrupting]`, `[overlapping]`, `[cuts in]`, `[interjecting]`
## Audio Tags Quick Reference
| Category | Tags | When to Use | |----------|------|-------------| | **Emotions** | [excited], [happy], [sad], [angry], [nervous], [curious] | Main emotional state - use 1 per section | | **Delivery** | [whispers], [shouts], [soft], [rushed], [drawn out] | Volume/speed changes | | **Reactions** | [laughs], [sighs], [gasps], [clears throat], [gulps] | Natural human moments - sprinkle sparingly | | **Pacing** | [pause], [hesitates], [stammers], [breathes] | Dramatic timing | | **Character** | [French accent], [British accent], [robotic tone] | Character voice shifts | | **Dialogue** | [interrupting], [overlapping], [cuts in] | Multi-speaker conversations |
**Most effective tags** (reliable results): - Emotions: `[excited]`, `[nervous]`, `[sad]`, `[happy]` - Reactions: `[laughs]`, `[sighs]`, `[whispers]` - Pacing: `[pause]`
**Less reliable** (test and regenerate): - Sound effects: `[explosion]`, `[gunshot]` - Accents: results vary by voice
**Full tag list**: 参见 [references/audio-tags.md](references/audio-tags.md)
## Troubleshooting
**Tags read aloud?** - 验证使用 `eleven_v3` 模型 - 使用 IVC/预制声音,不要使用 PVC - 简化标签(不要加“tone”后缀) - 增加文本长度(250+ 字符)
**Voice inconsistent?** - 片段太长 - 在 <800 字符处拆分 - 重新生成(v3 是非确定性的) - 尝试更低的稳定性设置
**WhatsApp won't play?** - 转换为 Opus 格式(见上文)
**No emotion despite tags?** - 声音可能与标签风格不匹配 - 尝试 Creative 稳定性模式 (0.5) - 在标签周围添加更多上下文