介绍
# AudioPod AI
完整的音频处理 API:音乐生成、伴奏分离、TTS(文本转语音)、降噪、转录、说话人分离、钱包管理。
## 安装
```bash pip install audiopod # Python npm install audiopod # Node.js ```
认证:设置 `AUDIOPOD_API_KEY` 环境变量或将其传递给客户端构造函数。
### 获取 API Key 1. 在 https://audiopod.ai/auth/signup 注册(免费,无需信用卡) 2. 前往 https://www.audiopod.ai/dashboard/account/api-keys 3. 点击“Create API Key”并复制密钥(以 `ap_` 开头) 4. 在 https://www.audiopod.ai/dashboard/account/wallet 向您的钱包充值(按需付费,无订阅)
```python from audiopod import AudioPod client = AudioPod() # uses AUDIOPOD_API_KEY env var # or: client = AudioPod(api_key="ap_...") ```
---
## AI 音乐生成
从文本提示词生成歌曲、说唱、伴奏、采样和人声。
**任务:** `text2music`(带人声的歌曲)、`text2rap`(说唱)、`prompt2instrumental`(伴奏)、`lyric2vocals`(仅人声)、`text2samples`(循环/采样)、`audio2audio`(风格迁移)、`songbloom`
### Python SDK
```python # Generate a full song with lyrics result = client.music.song( prompt="Upbeat pop, synth, drums, 120 bpm, female vocals, radio-ready", lyrics="Verse 1:\nWalking down the street on a sunny day\n\nChorus:\nWe're on fire tonight!", duration=60 ) print(result["output_url"])
# Generate rap result = client.music.rap( prompt="Lo-Fi Hip Hop, 100 BPM, male rap, melancholy, keyboard chords", lyrics="Verse 1:\nStarted from the bottom, now we climbing...", duration=60 )
# Generate instrumental (no lyrics needed) result = client.music.instrumental( prompt="Atmospheric ambient soundscape, uplifting, driving mood", duration=30 )
# Generic generate with explicit task result = client.music.generate( prompt="Electronic dance music, high energy", task="text2samples", # any task type duration=30 )
# Async: submit then poll job = client.music.create( prompt="Chill lofi beat", duration=30, task="prompt2instrumental" ) result = client.music.wait_for_completion(job["id"], timeout=600)
# Get available genre presets presets = client.music.get_presets()
# List/manage jobs jobs = client.music.list(skip=0, limit=50) job = client.music.get(job_id=123) client.music.delete(job_id=123) ```
### cURL
```bash # Song with lyrics curl -X POST "https://api.audiopod.ai/api/v1/music/text2music" \ -H "X-API-Key: $AUDIOPOD_API_KEY" \ -H "Content-Type: application/json" \ -d '{"prompt":"upbeat pop, synth, 120bpm, female vocals", "lyrics":"Walking down the street...", "audio_duration":60}'
# Rap curl -X POST "https://api.audiopod.ai/api/v1/music/text2rap" \ -H "X-API-Key: $AUDIOPOD_API_KEY" \ -H "Content-Type: application/json" \ -d '{"prompt":"Lo-Fi Hip Hop, male rap, 100 BPM", "lyrics":"Started from the bottom...", "audio_duration":60}'
# Instrumental curl -X POST "https://api.audiopod.ai/api/v1/music/prompt2instrumental" \ -H "X-API-Key: $AUDIOPOD_API_KEY" \ -H "Content-Type: application/json" \ -d '{"prompt":"ambient soundscape, uplifting", "audio_duration":30}'
# Samples/loops curl -X POST "https://api.audiopod.ai/api/v1/music/text2samples" \ -H "X-API-Key: $AUDIOPOD_API_KEY" \ -H "Content-Type: application/json" \ -d '{"prompt":"drum loop, sad mood", "audio_duration":15}'
# Vocals only curl -X POST "https://api.audiopod.ai/api/v1/music/lyric2vocals" \ -H "X-API-Key: $AUDIOPOD_API_KEY" \ -H "Content-Type: application/json" \ -d '{"prompt":"clean vocals, happy", "lyrics":"Eternal chorus of unity...", "audio_duration":30}'
# Check job status / get result curl "https://api.audiopod.ai/api/v1/music/jobs/JOB_ID" \ -H "X-API-Key: $AUDIOPOD_API_KEY"
# Get genre presets curl "https://api.audiopod.ai/api/v1/music/presets" \ -H "X-API-Key: $AUDIOPOD_API_KEY"
# List jobs curl "https://api.audiopod.ai/api/v1/music/jobs?skip=0&limit=50" \ -H "X-API-Key: $AUDIOPOD_API_KEY"
# Delete job curl -X DELETE "https://api.audiopod.ai/api/v1/music/jobs/JOB_ID" \ -H "X-API-Key: $AUDIOPOD_API_KEY" ```
### 参数
| 字段 | 必填 | 描述 | |-------|----------|-------------| | prompt | 是 | 风格/流派描述 | | lyrics | 歌曲/说唱/人声必填 | 带主歌/副歌结构的歌词 | | audio_duration | 否 | 时长(秒),默认 30 | | genre_preset | 否 | 流派预设名称(来自 presets 端点) | | display_name | 否 | 音轨显示名称 |
---
## 伴奏分离 (Stem Separation)
将音频拆分为独立的乐器/人声音轨。
### 模式
| 模式 | 音轨数 | 输出 | 用例 | |------|-------|--------|----------| | single | 1 | 仅指定音轨 | 人声隔离、鼓提取 | | two | 2 | 人声 + 伴奏 | 卡拉OK 音轨 | | four | 4 | 人声、鼓、贝斯、其他 | 标准混音(默认) | | six | 6 | + 吉他、钢琴 | 完整乐器分离 | | producer | 8 | + 底鼓、军鼓、踩镲 | 节拍制作 | | studio | 12 | + 镲片、低音、合成器 | 专业混音 | | mastering | 16 | 最大细节 | 取证分析 |
**单音轨选项:** vocals(人声)、drums(鼓)、bass(贝斯)、guitar(吉他)、piano(钢琴)、other(其他)
### Python SDK
```python # Sync: extract and wait for result result = client.stems.separate( url="https://youtube.com/watch?v=VIDEO_ID", mode="six", timeout=600 ) for stem, url in result["download_urls"].items(): print(f"{stem}: {url}")
# From local file result = client.stems.separate(file="/path/to/song.mp3", mode="four")
# Single stem extraction result = client.stems.separate( url="https://youtube.com/watch?v=ID", mode="single", stem="vocals" )
# Async: submit then poll job = client.stems.extract(url="https://youtube.com/watch?v=ID", mode="six") print(f"Job ID: {job['id']}") status = client.stems.status(job["id"]) # or wait: result = client.stems.wait_for_completion(job["id"], timeout=600)
# List available modes modes = client.stems.modes()
# Job management jobs = client.stems.list(skip=0, limit=50, status="COMPLETED") job = client.stems.get(job_id=1234) client.stems.delete(job_id=1234) ```
### cURL
```bash # Extract from URL curl -X POST "https://api.audiopod.ai/api/v1/stem-extraction/api/extract" \ -H "X-API-Key: $AUDIOPOD_API_KEY" \ -F "url=https://youtube.com/watch?v=VIDEO_ID" \ -F "mode=six"
# Extract from file curl -X POST "https://api.audiopod.ai/api/v1/stem-extraction/api/extract" \ -H "X-API-Key: $AUDIOPOD_API_KEY" \ -F "file=@/path/to/song.mp3" \ -F "mode=four"
# Single stem curl -X POST "https://api.audiopod.ai/api/v1/stem-extraction/api/extract" \ -H "X-API-Key: $AUDIOPOD_API_KEY" \ -F "url=URL" \ -F "mode=single" \ -F "stem=vocals"
# Check job status curl "https://api.audiopod.ai/api/v1/stem-extraction/status/JOB_ID" \ -H "X-API-Key: $AUDIOPOD_API_KEY"
# List available modes curl "https://api.audiopod.ai/api/v1/stem-extraction/modes" \ -H "X-API-Key: $AUDIOPOD_API_KEY"
# List jobs (filter by status: PENDING, PROCESSING, COMPLETED, FAILED) curl "https://api.audiopod.ai/api/v1/stem-extraction/jobs?skip=0&limit=50&status=COMPLETED" \ -H "X-API-Key: $AUDIOPOD_API_KEY"
# Get specific job curl "https://api.audiopod.ai/api/v1/stem-extraction/jobs/JOB_ID" \ -H "X-API-Key: $AUDIOPOD_API_KEY"
# Delete job curl -X DELETE "https://api.audiopod.ai/api/v1/stem-extraction/jobs/JOB_ID" \ -H "X-API-Key: $AUDIOPOD_API_KEY" ```
### 响应格式
```json { "id": 1234, "status": "COMPLETED", "download_urls": { "vocals": "https://...", "drums": "https://...", "bass": "https://...", "other": "https://..." }, "quality_scores": { "vocals": 0.95, "drums": 0.88 } } ```
---
## 文本转语音 (Text to Speech)
使用 50 多种语音、60 多种语言从文本生成语音。支持声音克隆。
### 语音类型
- **50 多种可用的成品语音** — 多语言,支持 60 多种语言,支持自动检测 - **自定义克隆** — 仅需约 5 秒的音频样本即可克隆任意声音
### Python SDK
```python # Generate speech and wait for result result = client.voice.generate( text="Hello, world! This is a test.", voice_id=123, speed=1.0 ) print(result["output_url"])
# Async: submit then poll job = client.voice.speak( text="Hello world", voice_id=123, speed=1.0 ) status = client.voice.get_job(job["id"]) result = client.voice.wait_for_completion(job["id"], timeout=300)
# List all available voices voices = client.voice.list() for v in voices: print(f"{v['id']}: {v['name']}")
# Clone a voice (needs ~5 sec audio sample) new_voice = client.voice.create( name="My Voice Clone", audio_file="./sample.mp3", description="Cloned from recording" )
# Get/delete voice voice = client.voice.get(voice_id=123) client.voice.delete(voice_id=123) ```
### cURL (Raw HTTP — 最可靠)
```bash # List all voices curl "https://api.audiopod.ai/api/v1/voice/voice-profiles" \ -H "X-API-Key: $AUDIOPOD_API_KEY"
# Generate speech (FORM DATA, not JSON!) curl -X POST "https://api.audiopod.ai/api/v1/voice/voices/{VOICE_UUID}/generate" \ -H "Authorization: Bearer $AUDIOPOD_API_KEY" \ -d "input_text=Hello world, this is a test" \ -d "audio_format=mp3" \ -d "speed=1.0"
# Poll job status curl "https://api.audiopod.ai/api/v1/voice/tts-jobs/{JOB_ID}/status" \ -H "Authorization: Bearer $AUDIOPOD_API_KEY"
# SDK-style endpoints (alternative) # Generate via SDK endpoint curl -X POST "https://api.audiopod.ai/api/v1/voice/tts/generate" \ -H "X-API-Key: $AUDIOPOD_API_KEY" \ -H "Content-Type: application/json" \ -d '{"text":"Hello world","voice_id":123,"speed":1.0}'
# Poll via SDK endpoint curl "https://api.audiopod.ai/api/v1/voice/tts/status/JOB_ID" \ -H "X-API-Key: $AUDIOPOD_API_KEY"
# List voices (SDK endpoint) curl "https://api.audiopod.ai/api/v1/voice/voices" \ -H "X-API-Key: $AUDIOPOD_API_KEY"
# Clone a voice curl -X POST "https://api.audiopod.ai/api/v1/voice/voices" \ -H "X-API-Key: $AUDIOPOD_API_KEY" \ -F "name=My Voice" \ -F "[email protected]" \ -F "description=Cloned voice"
# Delete voice curl -X DELETE "https://api.audiopod.ai/api/v1/voice/voices/VOICE_ID" \ -H "X-API-Key: $AUDIOPOD_API_KEY" ```
### 生成参数
| 字段 | 必填 | 描述 | |-------|----------|-------------| | input_text | 是 | 要朗读的文本(最多 5000 字符)。Raw HTTP 使用 `input_text`,SDK 使用 `text` | | audio_format | 否 | mp3, wav, ogg(默认:mp3)| | speed | 否 | 0.25 - 4.0(默认:1.0)| | language | 否 | ISO 代码,省略则自动检测 |
### 响应格式
```json // Generate response {"job_id": 12345, "status": "pending", "credits_reserved": 25}
// Status response (completed) {"status": "completed", "output_url": "https://r2-url/generated.mp3"} ```
### 重要提示
- Raw HTTP 生成端点使用 **表单数据 (form data)**,而不是 JSON。字段为 `input_text` 而非 `text` - SDK 端点 (`/api/v1/voice/tts/generate`) 使用 JSON,字段为 `text` - 输出文件可能是伪装成 .mp3 的 WAV 文件 — 请使用 `ffmpeg -i output.mp3 -c:a aac real.m4a` 转换 - 每次生成约消耗 55 积分,基于钱包计费
---
## 说话人分离 (Speaker Separation)
通过自动说话人日志分离音频。
### Python SDK
```python # Diarize and wait for result result = client.speaker.identify( file="./meeting.mp3", num_speakers=3, # optional hint for accuracy timeout=600 ) for segment in result["segments"]: print(f"Speaker {segment['speaker']}: {segment['text']} [{segment['start']:.1f}s - {segment['end']:.1f}s]")
# From URL result = client.speaker.identify( url="https://youtube.com/watch?v=VIDEO_ID", num_speakers=2 )
# Async: submit then poll job = client.speaker.diarize( file="./meeting.mp3", num_speakers=3 ) result = client.speaker.wait_for_completion(job["id"], timeout=600)
# Job management jobs = client.speaker.list(skip=0, limit=50, status="COMPLETED") job = client.speaker.get(job_id=123) client.speaker.delete(job_id=123) ```
### cURL
```bash # Diarize from file curl -X POST "https://api.audiopod.ai/api/v1/speaker/diarize" \ -H "X-API-Key: $AUDIOPOD_API_KEY" \ -F "[email protected]" \ -F "num_speakers=3"
# Diarize from URL curl -X POST "https://api.audiopod.ai/api/v1/speaker/diarize" \ -H "X-API-Key: $AUDIOPOD_API_KEY" \ -F "url=https://youtube.com/watch?v=VIDEO_ID" \ -F "num_speakers=2"
# Check job status curl "https://api.audiopod.ai/api/v1/speaker/jobs/JOB_ID" \ -H "X-API-Key: $AUDIOPOD_API_KEY"
# List jobs curl "https://api.audiopod.ai/api/v1/speaker/jobs?skip=0&limit=50" \ -H "X-API-Key: $AUDIOPOD_API_KEY"
# Delete job curl -X DELETE "https://api.audiopod.ai/api/v1/speaker/jobs/JOB_ID" \ -H "X-API-Key: $AUDIOPOD_API_KEY" ```
---
## 语音转文本 (转录)
转录音频/视频,支持说话人日志、词级时间戳和多种输出格式。
### Python SDK
```python # Transcribe URL and wait result = client.transcription.transcribe( url="https://youtube.com/watch?v=VIDEO_ID", speaker_diarization=True, min_speakers=2, max_speakers=5, timeout=600 ) print(f"Language: {result['detected_language']}") for seg in result["segments"]: print(f"[{seg['start']:.1f}s] {seg.get('speaker','?')}: {seg['text']}")
# Batch: multiple URLs at once result = client.transcription.transcribe( urls=["https://youtube.com/watch?v=ID1", "https://youtube.com/watch?v=ID2"], speaker_diarization=True )
# Upload local file job = client.transcription.upload( file_path="./recording.mp3", language="en", speaker_diarization=True ) result = client.transcription.wait_for_completion(job["id"], timeout=600)
# Async: submit then poll job = client.transcription.create( url="https://youtube.com/watch?v=ID", language="en", speaker_diarization=True, word_timestamps=True, min_speakers=2, max_speakers=4 ) result = client.transcription.wait_for_completion(job["id"], timeout=600)
# Get transcript in different formats transcript_json = client.transcription.get_transcript(job_id=123, format="json") transcript_srt = client.transcription.get_transcript(job_id=123, format="srt") transcript_vtt = client.transcription.get_transcript(job_id=123, format="vtt") transcript_txt = client.transcription.get_transcript(job_id=123, format="txt")
# Job management jobs = client.transcription.list(skip=0, limit=50, status="COMPLETED") job = client.transcription.get(job_id=123) client.transcription.delete(job_id=123) ```
### cURL
```bash # Transcribe from URL curl -X POST "https://api.audiopod.ai/api/v1/transcribe/transcribe" \ -H "X-API-Key: $AUDIOPOD_API_KEY" \ -H "Content-Type: application/json" \ -d '{"url":"https://youtube.com/watch?v=ID","enable_speaker_diarization":true,"word_timestamps":true}'
# Transcribe multiple URLs curl -X POST "https://api.audiopod.ai/api/v1/transcribe/transcribe" \ -H "X-API-Key: $AUDIOPOD_API_KEY" \ -H "Content-Type: application/json" \ -d '{"urls":["URL1","URL2"],"enable_speaker_diarization":true}'
# Upload file for transcription curl -X POST "https://api.audiopod.ai/api/v1/transcribe/transcribe-upload" \ -H "X-API-Key: $AUDIOPOD_API_KEY" \ -F "[email protected]" \ -F "language=en" \ -F "enable_speaker_diarization=true"
# Get job status curl "https://api.audiopod.ai/api/v1/transcribe/jobs/JOB_ID" \ -H "X-API-Key: $AUDIOPOD_API_KEY"
# Get transcript in specific format (json, srt, vtt, txt) curl "https://api.audiopod.ai/api/v1/transcribe/jobs/JOB_ID/transcript?format=srt" \ -H "X-API-Key: $AUDIOPOD_API_KEY"
# List jobs curl "https://api.audiopod.ai/api/v1/transcribe/jobs?offset=0&limit=50" \ -H "X-API-Key: $AUDIOPOD_API_KEY"
# Delete job curl -X DELETE "https://api.audiopod.ai/api/v1/transcribe/jobs/JOB_ID" \ -H "X-API-Key: $AUDIOPOD_API_KEY" ```
### 参数
| 字段 | 必填 | 描述 | |-------|----------|-------------| | url / urls | 是(或 file)| 要转录的 URL(YouTube, SoundCloud, 直链)| | language | 否 | ISO 639-1 代码,省略则自动检测 | | enable_speaker_diarization | 否 | 启用说话人识别(默认:false)| | min_speakers / max_speakers | 否 | 说话人数量提示,以获得更好的日志分离 | | word_timestamps | 否 | 启用词级时间戳(默认:true)|
### 输出格式
- **json** — 包含片段、时间戳、说话人的完整结构化输出 - **srt** — SubRip 字幕格式 - **vtt** — WebVTT 字幕格式 - **txt** — 纯文本转录
---
## 降噪 (Noise Reduction)
从音频/视频文件中移除背景噪音。
### Python SDK
```python # Denoise and wait for result result = client.denoiser.denoise(file="./noisy-audio.mp3", timeout=600) print(f"Clean audio: {result['output_url']}")
# From URL result = client.denoiser.denoise(url="https://example.com/noisy.mp3")
# Async: submit then poll job = client.denoiser.create(file="./noisy-audio.mp3") result = client.denoiser.wait_for_completion(job["id"], timeout=600)
# From URL (async) job = client.denoiser.create(url="https://example.com/noisy.mp3")
# Job management jobs = client.denoiser.list(skip=0, limit=50, status="COMPLETED") job = client.denoiser.get(job_id=123) client.denoiser.delete(job_id=123) ```
### cURL
```bash # Denoise from file curl -X POST "https://api.audiopod.ai/api/v1/denoiser/denoise" \ -H "X-API-Key: $AUDIOPOD_API_KEY" \ -F "[email protected]"
# Denoise from URL curl -X POST "https://api.audiopod.ai/api/v1/denoiser/denoise" \ -H "X-API-Key: $AUDIOPOD_API_KEY" \ -F "url=https://example.com/noisy.mp3"
# Check job status curl "https://api.audiopod.ai/api/v1/denoiser/jobs/JOB_ID" \ -H "X-API-Key: $AUDIOPOD_API_KEY"
# List jobs curl "https://api.audiopod.ai/api/v1/denoiser/jobs?skip=0&limit=50" \ -H "X-API-Key: $AUDIOPOD_API_KEY"
# Delete job curl -X DELETE "https://api.audiopod.ai/api/v1/denoiser/jobs/JOB_ID" \ -H "X-API-Key: $AUDIOPOD_API_KEY" ```
---
## 钱包与计费 (Wallet & Billing)
查看余额、估算成本和查看使用记录。
### Python SDK
```python # Get current balance balance = client.wallet.get_balance() print(f"Balance: ${balance['balance_usd']}")
# Check if balance is sufficient for an operation check = client.wallet.check_balance( service_type="stem_extraction", duration_seconds=180 ) print(f"Sufficient: {check['sufficient']}")
# Estimate cost before running estimate = client.wallet.estimate_cost( service_type="transcription", duration_seconds=300 ) print(f"Cost: ${estimate['cost_usd']}")
# Get pricing for all services pricing = client.wallet.get_pricing()
# View usage history usage = client.wallet.get_usage(page=1, limit=50) ```
### cURL
```bash # Get balance curl "https://api.audiopod.ai/api/v1/api-wallet/balance" \ -H "X-API-Key: $AUDIOPOD_API_KEY"
# Check balance sufficiency curl -X POST "https://api.audiopod.ai/api/v1/api-wallet/check-balance" \ -H "X-API-Key: $AUDIOPOD_API_KEY" \ -H "Content-Type: application/json" \ -d '{"service_type":"stem_extraction","duration_seconds":180}'
# Estimate cost curl -X POST "https://api.audiopod.ai/api/v1/api-wallet/estimate-cost" \ -H "X-API-Key: $AUDIOPOD_API_KEY" \ -H "Content-Type: application/json" \ -d '{"service_type":"transcription","duration_seconds":300}'
# Get pricing curl "https://api.audiopod.ai/api/v1/api-wallet/pricing" \ -H "X-API-Key: $AUDIOPOD_API_KEY"
# Usage history curl "https://api.audiopod.ai/api/v1/api-wallet/usage?page=1&limit=50" \ -H "X-API-Key: $AUDIOPOD_API_KEY" ```
---
## API 端点概览
| 服务 | 端点 | 方法 | |---------|----------|--------| | **Music** | `/api/v1/music/{task}` | POST | | Music jobs | `/api/v1/music/jobs/{id}` | GET/DELETE | | Music presets | `/api/v1/music/presets` | GET | | **Stems** | `/api/v1/stem-extraction/api/extract` | POST (multipart) | | Stems status | `/api/v1/stem-extraction/status/{id}` | GET | | Stems modes | `/api/v1/stem-extraction/modes` | GET | | Stems jobs | `/api/v1/stem-extraction/jobs` | GET | | **TTS** generate | `/api/v1/voice/voices/{uuid}/generate` | POST (form data) | | TTS generate (SDK) | `/api/v1/voice/tts/generate` | POST (JSON) | | TTS status | `/api/v1/voice/tts-jobs/{id}/status` | GET | | TTS status (SDK) | `/api/v1/voice/tts/status/{id}` | GET | | Voice list | `/api/v1/voice/voice-profiles` | GET | | Voice list (SDK) | `/api/v1/voice/voices` | GET | | **Speaker** | `/api/v1/speaker/diarize` | POST (multipart) | | Speaker jobs | `/api/v1/speaker/jobs/{id}` | GET/DELETE | | **Transcribe** URL | `/api/v1/transcribe/transcribe` | POST (JSON) | | Transcribe upload | `/api/v1/transcribe/transcribe-upload` | POST (multipart) | | Transcript output | `/api/v1/transcribe/jobs/{id}/transcript?format=` | GET | | Transcribe jobs | `/api/v1/transcribe/jobs` | GET | | **Denoise** | `/api/v1/denoiser/denoise` | POST (multipart) | | Denoise jobs | `/api/v1/denoiser/jobs/{id}` | GET/DELETE | | **Wallet** balance | `/api/v1/api-wallet/balance` | GET | | Wallet pricing | `/api/v1/api-wallet/pricing` | GET | | Wallet usage | `/api/v1/api-wallet/usage` | GET |
## 认证头 (Auth Headers)
两种认证方式均有效: - `X-API-Key: ap_...` — 适用于大多数端点 - `Authorization: Bearer ap_...` — 适用于 TTS generate/status
## 已知问题
- SDK 方法签名可能与原始 API 不同 — 有疑问时请使用 cURL 示例 - TTS 输出存储在 Cloudflare R2 上,通过任务状态中的 `output_url` 下载 - TTS 输出文件可能是伪装成 .mp3 的 WAV 文件 — 通过 WhatsApp 发送前请使用 ffmpeg 转换