Audio Content Generator

介绍

# 🎙️ Audio Content Generator

使用 AI 编写的脚本和 ElevenLabs 文本转语音技术，按需生成高质量的有声读物、播客或教育音频内容。

## 快速开始

**创建有声读物章节：** ``` User: "Create a 5-minute audiobook chapter about a dragon discovering friendship" ```

**生成播客：** ``` User: "Make a 10-minute podcast about the history of coffee" ```

**制作教育内容：** ``` User: "Generate a 15-minute educational audio explaining how neural networks work" ```

## 内容格式

### 有声读物 **风格：** 具有情感深度的叙事性讲述 - 清晰的开头、中间和结尾 - 描述性语言和生动的意象 - 富有戏剧性的节奏和深思熟虑的停顿 - 与故事相匹配的情感基调 - 使用语音效果（如 `[whispers]`、`[excited]`、`[serious]`）以增强表现力

**示例结构：** ``` [Opening hook - set the scene] [long pause]

[Story development with character emotions] [short pause] between sentences [long pause] between paragraphs

[Climax with dramatic tension] [long pause]

[Resolution and emotional closure] ```

### 播客 **风格：** 对话式且引人入胜 - 热情、欢迎的开场（15-30 秒） - 自然流畅的主要内容 - 主题之间的过渡 - 令人难忘的结尾和关键要点 - 全程保持对话式基调

**示例结构：** ``` **Intro:** "Welcome to [topic]. I'm excited to share..." [short pause]

**Main Content:** "Let's start with... [topic 1]" [long pause] between segments

**Outro:** "Thanks for listening! Remember..." ```

### 教育内容 **风格：** 适合学习的清晰讲解 - 对复杂话题的简单介绍 - 逐步拆解 - 现实世界的例子和类比 - 结尾回顾关键概念 - 充满热情的表达，在重点处使用 `[excited]`

**示例结构：** ``` **Introduction:** What is [topic] and why it matters?

**Main Content:** - Concept 1: Explanation + Example - Concept 2: Explanation + Example - Concept 3: Explanation + Example

**Summary:** Key takeaways and next steps ```

## 长度指南

**字数与时长转换：** - 5 分钟 ≈ 375 字 - 10 分钟 ≈ 750 字 - 15 分钟 ≈ 1,125 字 - 20 分钟 ≈ 1,500 字 - 30 分钟 ≈ 2,250 字

**语速：** 平均对话语速约为每分钟 75 字

**实际限制：** - 最小值：2 分钟（约 150 字） - 最大值：30 分钟（约 2,250 字） - 最佳时长：5-15 分钟以获得最佳互动效果

## 工作流程说明

### 步骤 1：理解请求

解析用户的请求以获取： 1. **内容类型**（有声读物、播客、教育内容或根据主题推断） 2. **主题/题材**（内容应关于什么） 3. **目标长度**（多少分钟） 4. **基调/风格**（戏剧性、随意、教育性等） 5. **特殊要求**（特定语音、强调某些要点）

### 步骤 2：计算字数

``` target_words = target_minutes × 75 ```

示例：10 分钟 = 10 × 75 = 750 字

### 步骤 3：生成脚本

按照以下规则编写完整脚本：

**内容指南：** - 以引人入胜的开场白强势开头 - 保持自然、对话式的流畅度 - 使用主动语态和简单的句子结构 - 包含相关的例子和故事 - 以令人满意的结论结束

**格式规则：** - 在句子后添加 `[short pause]`（适度使用，并非每句都加） - 在段落或主要章节之间添加 `[long pause]` - 策略性地使用语音效果：`[whispers]`、`[shouts]`、`[excited]`、`[serious]`、`[sarcastic]`、`[sings]`、`[laughs]` - 将数字写成单词形式："twenty-three" 而非 "23" - 首次出现时拼写出首字母缩略词："AI, or artificial intelligence" - 避免复杂的标点符号（破折号有效，但分号朗读效果不佳） - 在 TTS 转换前移除 markdown 格式

### 步骤 4：展示脚本

向用户展示脚本并询问： ``` Here's the [format] script I've created (approximately [length] minutes):

[Display the script]

Would you like me to: 1. Generate the audio now 2. Make changes to the script 3. Adjust the length or tone ```

### 步骤 5：处理用户反馈

如果用户请求更改： - 重新生成调整后的脚本 - 保持目标字数 - 展示修订版本

如果用户批准： - 继续进行音频生成

### 步骤 6：生成音频

**为 TTS 格式化脚本：** 1. 移除任何剩余的 markdown（标题、粗体、斜体） 2. 确保语音效果采用正确的 `[effect]` 格式 3. 检查停顿位置是否恰当 4. 验证数字和缩略词是否已拼写出来

**调用 TTS 脚本：**

**重要提示：** `ELEVENLABS_API_KEY` 环境变量已在系统中配置。直接调用 TTS 脚本即可。

```bash uv run /home/clawdbot/clawdbot/skills/sag/scripts/tts.py \ -o /tmp/audio-gen-[timestamp]-[topic-slug].mp3 \ -m eleven_multilingual_v2 \ "[formatted_script]" ```

**对于长脚本，使用 heredoc：** ```bash uv run /home/clawdbot/clawdbot/skills/sag/scripts/tts.py \ -o /tmp/audio-gen-[timestamp]-[topic-slug].mp3 \ -m eleven_multilingual_v2 \ "$(cat <<'EOF' [formatted_script] EOF )" ```

**返回结果：** ``` MEDIA:/tmp/audio-gen-[timestamp]-[topic-slug].mp3

Your [format] is ready! [Brief description of content]. Duration: approximately [X] minutes. ```

## 语音效果（SSML 标签）

可用的语音调制效果（适度使用以增强表现力）：

- `[whispers]` - 轻柔、亲密的表达 - `[shouts]` - 大声、有力的表达 - `[excited]` - 热情、充满活力的基调 - `[serious]` - 严肃、庄重的基调 - `[sarcastic]` - 讽刺、嘲笑的基调 - `[sings]` - 音乐性、旋律般的表达 - `[laughs]` - 愉悦、欢快的基调 - `[short pause]` - 短暂的沉默（约 0.5 秒） - `[long pause]` - 延长的沉默（约 1-2 秒）

**最佳实践：** - 将效果用于情感时刻，而非每句话 - 停顿是你最强大的节奏控制工具 - 语音效果在有声读物和戏剧性内容中效果最佳 - 保持播客和教育内容主要以自然为主

## 错误处理

### 脚本过长如果生成的脚本超过目标 20% 以上： ``` The script I generated is [X] words ([Y] minutes), which is longer than your target of [Z] minutes. Would you like me to: 1. Condense it to fit the target length 2. Split it into multiple parts 3. Keep it as is ```

### 脚本过短如果生成的脚本低于目标 20% 以上： ``` The script is [X] words ([Y] minutes), shorter than your target. Would you like me to: 1. Expand it with more detail 2. Add additional examples or stories 3. Generate as is ```

### TTS 生成失败如果 TTS 脚本失败： ``` I've created the script, but I'm unable to generate the audio right now. Here's your script:

[Display script]

Error: [specific error message]

You can: 1. Check that ELEVENLABS_API_KEY is configured 2. Use the script with your own text-to-speech tool 3. Try again in a moment 4. Ask me to troubleshoot the audio generation ```

**常见 TTS 问题：** - API key 未设置：在配置中验证 ELEVENLABS_API_KEY - 速率限制：稍等片刻后重试 - 文本过长：拆分为更小的片段（最多约 5000 字符）

### 无效请求对于不切实际的请求（例如 "100 小时的有声读物"）： ``` That length would require [X] words and take significant time to generate. I recommend: - Breaking it into multiple episodes/chapters - Targeting 5-30 minutes per audio file - Creating a series instead of one long file ```

## 最佳效果技巧

### 制作引人入胜的有声读物 - 侧重于角色情感和感官细节 - 使用停顿来制造戏剧张力 - 变换句子长度以创造节奏 - 包含内心独白和反思

### 制作引人入胜的播客 - 以问题或令人惊讶的事实开头 - 使用对话式短语："You know what's interesting..." - 包含日常生活中的相关例子 - 以可执行的要点结束

### 制作有效的教育内容 - 使用 "像对五岁孩子一样解释" 的方法 - 从简单概念逐步构建到复杂概念 - 重复关键术语和定义 - 提供多个示例以明确说明

## 技术说明

**TTS 实现：** - 使用 Python 脚本：`~/.clawdbot/clawdbot/skills/sag/scripts/tts.py` - 无需二进制安装（纯 Python + requests） - 直接调用 ElevenLabs API - 兼容 Linux 和 macOS

**文件存储：** - 音频文件保存到 `/tmp/audio-gen/` - 文件名格式：`audio-gen-[timestamp]-[topic-slug].mp3` - 文件会在 24 小时后自动清理

**API 要求：** - Anthropic API 用于脚本生成（已配置） - ElevenLabs API 用于文本转语音（通过 ELEVENLABS_API_KEY 配置） - 两个服务都必须配置并有可用额度

**支持的模型：** - `eleven_multilingual_v2` - 最佳质量（默认） - `eleven_turbo_v2` - 更快的生成速度 - `eleven_turbo_v2_5` - 最快的生成速度 - `eleven_multilingual_v1` - 传统模型

**成本估算：** - 10 分钟音频（约 750 字）：约 $1.43 - Claude API：约 $0.075 - ElevenLabs：约 $1.35 - 更长的内容按比例增加

**生成时间：** - 脚本生成：5-30 秒（取决于长度） - 音频生成：5-15 秒（ElevenLabs 处理） - 总计：对于 10 分钟音频，通常在 1 分钟以内

## 限制

1. **最大长度：** 每个音频文件 30 分钟（约 2,250 字） - 对于更长的内容，请创建多个部分/集数

2. **单一语音：** 目前支持一个旁白语音 - 无法进行多语音对话或角色配音

3. **无背景音乐：** 仅纯语音旁白 - 无背景音乐、音效或音频混音

4. **实时生成：** 每个请求都会生成全新内容 - 无预制模板或缓存音频

5. **语言：** 主要为英语 - ElevenLabs 支持其他语言，但内容生成已针对英语优化

## 示例对话

### 示例 1：快速有声读物 ``` User: Create a 5-minute audiobook chapter about a lonely robot learning to feel emotions

Agent: I'll create a 5-minute sci-fi audiobook chapter. Let me write the story...

[Generates ~400-word narrative with emotional arc]

Here's the story I've created about Unit-7, a maintenance robot who discovers the meaning of friendship through an unexpected encounter. Would you like me to generate the audio or make changes first?

User: Generate the audio

Agent: [Invokes SAG with formatted script]

MEDIA:/tmp/audio-gen-1234567890-lonely-robot.mp3

Your audiobook chapter is ready! The story follows Unit-7's emotional awakening through a touching encounter with a young engineer. Duration: approximately 5 minutes. ```

### 示例 2：包含修订的播客 ``` User: Make a 10-minute podcast about sustainable living tips

Agent: [Generates podcast script with intro, 5 practical tips, outro]

Here's a 10-minute podcast with 5 actionable sustainable living tips. The tone is encouraging and practical. Would you like me to: 1. Generate audio now 2. Make it more urgent/climate-focused 3. Add more specific examples 4. Adjust the length

User: Add more specific examples for each tip

Agent: [Regenerates with detailed examples]

Better? I've added specific examples like calculating your carbon footprint, composting at home, and choosing sustainable brands. Ready for audio?

User: Perfect!

Agent: MEDIA:/tmp/audio-gen-1234567890-sustainable-living.mp3

Your sustainable living podcast is ready with 5 practical tips, each with concrete examples you can start using today! Duration: 10 minutes. ```

### 示例 3：教育内容 ``` User: Create a 15-minute educational audio explaining quantum computing for beginners

Agent: I'll create a beginner-friendly 15-minute explanation of quantum computing...

[Generates ~1,100 words with analogies, examples, and clear structure]

Script preview: - Introduction: What is quantum computing? (2 min) - Classical vs Quantum: The key differences (4 min) - Qubits and Superposition: How it works (5 min) - Real Applications: What it means for you (3 min) - Conclusion: The future of computing (1 min)

Should I generate the audio?

User: Yes please

Agent: MEDIA:/tmp/audio-gen-1234567890-quantum-computing.mp3

Your educational podcast on quantum computing is ready! It uses everyday analogies to explain qubits, superposition, and real-world applications. Duration: approximately 15 minutes. ```

## 何时使用此技能

当用户请求以下内容时使用 audio-gen 技能： - "Create an audiobook about..." - "Make a podcast episode about..." - "Generate audio content explaining..." - "Record a story about..." - "Produce an audio guide for..." - "Create a narrated version of..." - "Make an audio lesson about..."

**请勿用于：** - 转录现有音频（请改用 whisper 技能） - 将用户编写的文本转换为音频（请直接使用 SAG 技能） - 实时语音交互（请使用 voice-call 插件） - 多说话者对话或采访（当前限制）

## 高级用法

### 语音选择用户可以请求特定语音： ``` User: Create a podcast with the Rachel voice

Agent: [Checks voice configuration for "Rachel" voice ID and uses it in SAG command] ```

### 系列/集数对于多部分内容： ``` User: Create a 3-episode series about space exploration

Agent: I'll create Episode 1 first. Each episode will be 10 minutes. Let's start with "The Dawn of the Space Age"...

[After completion]

Episode 1 is ready! Would you like me to continue with Episode 2? ```

### 格式混合混合格式以创造独特风格： ``` User: Create an educational podcast that tells a story

Agent: [Generates content that combines storytelling narrative with educational explanations] ```

## 故障排除

**问题：** 音频听起来机械或不自然 **解决方案：** 添加更多停顿和语音效果。使用缩略词和对话式语言。

**问题：** 脚本长度与请求不符 **解决方案：** 使用明确的字数目标重新生成。检查计算（75 字/分钟）。

**问题：** 内容太技术性或太简单 **解决方案：** 询问用户目标受众。相应地调整复杂性。

**问题：** SAG 命令失败 **解决方案：** 检查是否设置了 ELEVENLABS_API_KEY。验证 SAG 技能已安装并正常工作。

**问题：** 用户想手动编辑脚本 **解决方案：** 提供纯文本脚本。用户可以修改它并粘贴回来进行音频生成。

---

💡 **专业提示：** 在创建音频之前，始终先生成脚本并获得用户批准。这可以节省时间和 API 成本，并确保用户得到他们确切想要的内容。

Audio Content Generator

介绍

更多产品

self-improving-agent

Find Skills

Sonoscli