Transcribe audio files via OpenRouter using audio-capable models

介绍

# OpenRouter Audio Transcription

使用 OpenRouter 的聊天补全 API 和 `input_audio` 内容类型来转录音频文件。适用于任何支持音频的模型。

## 快速开始

```bash {baseDir}/scripts/transcribe.sh /path/to/audio.m4a ```

输出会打印到标准输出（stdout）。

## 常用参数

```bash # Custom model (default: google/gemini-2.5-flash) {baseDir}/scripts/transcribe.sh audio.ogg --model openai/gpt-4o-audio-preview

# Custom instructions {baseDir}/scripts/transcribe.sh audio.m4a --prompt "Transcribe with speaker labels"

# Save to file {baseDir}/scripts/transcribe.sh audio.m4a --out /tmp/transcript.txt

# Custom caller identifier (for OpenRouter dashboard) {baseDir}/scripts/transcribe.sh audio.m4a --title "MyApp" ```

## 工作原理

1. 使用 ffmpeg 将音频转换为 WAV（单声道，16kHz） 2. 对音频进行 Base64 编码 3. 将带有 `input_audio` 内容的数据发送到 OpenRouter 聊天补全接口 4. 从响应中提取转录文本

## API 密钥

设置 `OPENROUTER_API_KEY` 环境变量，或在 `~/.clawdbot/clawdbot.json` 中进行配置：

```json5 { skills: { "openrouter-transcribe": { apiKey: "YOUR_OPENROUTER_KEY" } } } ```

## 请求头

该脚本会向 OpenRouter 发送用于标识的请求头： - `X-Title`：调用者名称（默认："Peanut/Clawdbot"） - `HTTP-Referer`：参考 URL（默认："https://clawdbot.com"）

这些信息会显示在你的 OpenRouter 仪表盘中以便追踪。

## 故障排除

**ffmpeg 格式错误**：该脚本使用临时目录（而非 `mktemp -t file.wav`），因为 macOS 的 mktemp 会在扩展名之后添加随机后缀，从而导致格式检测失败。

**参数列表过长**：较大的音频文件会生成巨大的 Base64 字符串，可能会超出 shell 的参数限制。该脚本会将数据写入临时文件（jq 使用 `--rawfile`，curl 使用 `@file`），而不是直接将数据作为参数传递。

**空响应**：如果你收到 "Empty response from API"，脚本会转储原始响应以供调试。常见原因包括： - API 密钥无效 - 模型不支持音频输入 - 音频文件过大或损坏