ClawSkills logoClawSkills

Edge TTS

Text-to-speech conversion using node-edge-tts npm package for generating audio from text. Supports multiple voices, languages, speed adjustment, pitch control,

Introduction

# Edge-TTS Skill

## Overview

Generate high-quality text-to-speech audio using Microsoft Edge's neural TTS service via the node-edge-tts npm package. Supports multiple languages, voices, adjustable speed/pitch, and subtitle generation.

## Quick Start

When you detect TTS intent from triggers or user request:

1. **Call the tts tool** (Clawdbot built-in) to convert text to speech 2. The tool returns a MEDIA: path 3. Clawdbot routes the audio to the current channel

```javascript // Example: Built-in tts tool usage tts("Your text to convert to speech") // Returns: MEDIA: /path/to/audio.mp3 ```

## Trigger Detection

Recognize "tts" keyword as TTS requests. The skill automatically filters out TTS-related keywords from text before conversion to avoid converting the trigger words themselves to audio.

## Advanced Customization

### Using the Node.js Scripts

For more control, use the bundled scripts directly:

#### TTS Converter ```bash cd scripts npm install node tts-converter.js "Your text" --voice en-US-AriaNeural --rate +10% --output output.mp3 ```

**Options:** - `--voice, -v`: Voice name (default: en-US-AriaNeural) - `--lang, -l`: Language code (e.g., en-US, es-ES) - `--format, -o`: Output format (default: audio-24khz-48kbitrate-mono-mp3) - `--pitch`: Pitch adjustment (e.g., +10%, -20%, default) - `--rate, -r`: Rate adjustment (e.g., +10%, -20%, default) - `--volume`: Volume adjustment (e.g., +0%, -10%, default) - `--save-subtitles, -s`: Save subtitles as JSON file - `--output, -f`: Output file path (default: tts_output.mp3) - `--proxy, -p`: Proxy URL (e.g., http://localhost:7890) - `--timeout`: Request timeout in milliseconds (default: 10000) - `--list-voices, -L`: List available voices

#### Configuration Manager ```bash cd scripts npm install node config-manager.js --set-voice en-US-AriaNeural

node config-manager.js --set-rate +10%

node config-manager.js --get

node config-manager.js --reset ```

### Voice Selection

Common voices (use `--list-voices` for full list):

**English:** - `en-US-MichelleNeural` (female, natural, **default**) - `en-US-AriaNeural` (female, natural) - `en-US-GuyNeural` (male, natural) - `en-GB-SoniaNeural` (female, British) - `en-GB-RyanNeural` (male, British)

**Other Languages:** - `es-ES-ElviraNeural` (Spanish, Spain) - `fr-FR-DeniseNeural` (French) - `de-DE-KatjaNeural` (German) - `ja-JP-NanamiNeural` (Japanese) - `zh-CN-XiaoxiaoNeural` (Chinese) - `ar-SA-ZariyahNeural` (Arabic)

### Rate Guidelines

Rate values use percentage format: - `"default"`: Normal speed - `"-20%"` to `"-10%"`: Slow, clear (tutorials, stories, accessibility) - `"+10%"` to `"+20%"`: Slightly fast (summaries) - `"+30%"` to `"+50%"`: Fast (news, efficiency)

### Output Formats

Choose audio quality based on use case: - `audio-24khz-48kbitrate-mono-mp3`: Standard quality (voice notes, messages) - `audio-24khz-96kbitrate-mono-mp3`: High quality (presentations, content) - `audio-48khz-96kbitrate-stereo-mp3`: Highest quality (professional audio, music)

## Resources

### scripts/tts-converter.js Main TTS conversion script using node-edge-tts. Generates audio files with customizable voice, rate, volume, pitch, and format. Supports subtitle generation and voice listing.

### scripts/config-manager.js Manages persistent user preferences for TTS settings (voice, language, format, pitch, rate, volume). Stores config in `~/.tts-config.json`.

### scripts/package.json NPM package configuration with node-edge-tts dependency.

### references/node_edge_tts_guide.md Complete documentation for node-edge-tts npm package including: - Full voice list by language - Prosody options (rate, pitch, volume) - Usage examples (CLI and Module) - Subtitle generation - Output formats - Best practices and limitations

### Voice Testing Test different voices and preview audio quality at: https://tts.travisvn.com/

Refer to this when you need specific voice details or advanced features.

## Installation

To use the bundled scripts:

```bash cd /home/user/clawd/skills/public/tts-skill/scripts npm install ```

This installs: - `node-edge-tts` - TTS library - `commander` - CLI argument parsing

## Workflow

1. **Detect intent**: Check for "tts" trigger or keyword in user message 2. **Choose method**: Use built-in `tts` tool for simple requests, or `scripts/tts-converter.js` for customization 3. **Generate audio**: Convert the target text (message, search results, summary) 4. **Return to user**: The tts tool returns a MEDIA: path; Clawdbot handles delivery

## Testing

### Basic Test Run the test script to verify TTS functionality: ```bash cd /home/user/clawd/skills/public/edge-tts/scripts npm test ``` This generates a test audio file and verifies the TTS service is working.

### Voice Testing Test different voices and preview audio quality at: https://tts.travisvn.com/

### Integration Test Use the built-in `tts` tool for quick testing: ```javascript // Example: Test TTS with default settings tts("This is a test of the TTS functionality.") ```

### Configuration Test Verify configuration persistence: ```bash cd /home/user/clawd/skills/public/edge-tts/scripts node config-manager.js --get node config-manager.js --set-voice en-US-GuyNeural node config-manager.js --get ```

## Troubleshooting

- **Test connectivity**: Run `npm test` to check if TTS service is accessible - **Check voice availability**: Use `node tts-converter.js --list-voices` to see available voices - **Verify proxy settings**: If using proxy, test with `node tts-converter.js "test" --proxy http://localhost:7890` - **Check audio output**: The test should generate `test-output.mp3` in the scripts directory

## Notes

- node-edge-tts uses Microsoft Edge's online TTS service (updated, working authentication) - No API key needed (free service) - Output is MP3 format by default - Requires internet connection - Supports subtitle generation (JSON format with word-level timing) - **Temporary File Handling**: By default, audio files are saved to the system's temporary directory (`/tmp/edge-tts-temp/` on Unix, `C:\Users\<user>\AppData\Local\Temp\edge-tts-temp\` on Windows) with unique filenames (e.g., `tts_1234567890_abc123.mp3`). Files are not automatically deleted - the calling application (Clawdbot) should handle cleanup after use. You can specify a custom output path with the `--output` option if permanent storage is needed. - **TTS keyword filtering**: The skill automatically filters out TTS-related keywords (tts, TTS, text-to-speech) from text before conversion to avoid converting the trigger words themselves to audio - For repeated preferences, use `config-manager.js` to set defaults - **Default voice**: `en-US-MichelleNeural` (female, natural) - Neural voices (ending in `Neural`) provide higher quality than Standard voices

More Products