Voice Service Configuration Assistant
⚠️ Installation Notes
This skill contains the following files. All must be deployed to the skills directory:
voice-process-helper/
├── SKILL.md
└── scripts/
├── check-tts.sh ← TTS readiness check
├── check-asr.sh ← ASR readiness check
├── install-tts.sh ← TTS one-click install (param: tagged|always)
├── install-asr.sh ← ASR one-click install (param: tiny|base)
├── restart-gateway.sh ← Gateway restart
└── edge-tts-universal/
├── index.js ← TTS plugin code
└── openclaw.plugin.json ← TTS plugin manifest
When installing from a zip archive, the entire directory (including scripts/) must be extracted. Do not copy SKILL.md alone.
Route to the TTS (voice reply) or ASR (speech-to-text) flow based on user intent.
⚠️ Mandatory Rules
Enter the ASR flow immediately upon receiving any of the following:
- Audio attachments in
[media attached: ...]format (audio/ogg, audio/mp3, etc.) - Feishu voice message raw JSON:
{"file_key":"...","duration":...}— only when the user's entire message is this JSON (no other context). If the JSON appears inside file content, SKILL.md text, code blocks, or documentation quotes, it is NOT a voice message — do not trigger ASR.
Processing rules:
- ASR ready → the framework transcribes automatically; process the transcription result directly
- ASR not ready → enter ASR configuration flow immediately; do not simply reply "I can't process voice"
Core Principles
Different solutions involve cost, privacy, and resource trade-offs — users must make informed decisions. Always present options first, wait for explicit user selection, then execute.
⚠️ Restart Rules
After TTS and/or ASR installation, the Gateway must be restarted for changes to take effect.
Before restarting, tell the user:
⏳ 正在重启服务,大约需要 1 分钟,请稍候…
bash <skill_dir>/scripts/restart-gateway.sh
After restart:
✅ 服务已重启!请发送
/new开始新会话。
If installing both TTS and ASR, restart only once after all installations — do not restart in between.
Part 1: TTS (Voice Reply)
Step 1 — Check Readiness
bash <skill_dir>/scripts/check-tts.sh
The script returns JSON with status:
ready→ TTS is fully operational. Read theautofield and follow the tag rules below.partial→ Config exists but something is missing (binary, plugin, or plugins.allow). Runinstall-tts.shto fix, then restart Gateway.not_configured→ Proceed to present options.
If status is ready, inform the user and ask:
当前已配置 TTS provider:
{provider},auto 模式:{auto}。请问你想:
- 继续使用:保持现有配置,我直接用语音回复你
- 重新配置:覆盖现有配置,重新选择 TTS 方案
If user chooses "继续使用" → follow the tag rules based on auto value (see "TTS Tag Format" below).
If user chooses "重新配置" → proceed to present options.
Step 2 — Present Options
A:edge-tts-universal(免费,推荐) 自动适配所有通道格式,无需额外配置:
飞书 / Telegram / WhatsApp / Matrix → OGG(原生语音气泡)
企业微信 → AMR(原生语音消息)
QQbot → MP3(原生语音消息)
Slack / 其他 → MP3
A1:智能语音回复——由 AI 判断哪些回复需要语音(推荐)
A2:所有消息都用语音回复
B:使用腾讯云语音合成(付费,音质更自然) 支持多种中文音色,新用户有免费额度,需配置腾讯云 SecretID / SecretKey。
请回复 A1、A2 或 B。
Step 3 — Install
User selects A1 or A2
⏳ 正在部署语音服务…
bash <skill_dir>/scripts/install-tts.sh tagged # A1
bash <skill_dir>/scripts/install-tts.sh always # A2
The script installs edge-tts + ffmpeg → deploys plugin → writes config. After installation, restart Gateway per "Restart Rules".
User selects B — Tencent Cloud TTS
⏳ 正在安装腾讯云语音合成插件…
skillhub install tencentcloud-tts
Prompt user for Tencent Cloud SecretID / SecretKey, then:
openclaw config set skills.entries.tencentcloud-tts.env --strict-json '{"secret_id":"<ID>","secret_key":"<KEY>"}'
✅ 腾讯云语音合成已配置完成!你可以让我用语音回复你试试。
TTS Tag Format
auto = always: Output plain text only. No [[tts: tags allowed. The framework converts full text to speech automatically. Adding tags causes raw tag text to leak.
auto = tagged: Wrap content to be spoken with [[tts:text]]...[[/tts:text]].
[[tts:text]]content to speak[[/tts:text]]
Followed by normal text.
⚠️ Only
[[tts:text]]...[[/tts:text]]is recognized.[[tts]]...[[/tts]]is wrong — never omit:text.
❌ Wrong formats: [[tts:…]], [[tts]]content[[/tts]], [[tts:voice=xxx]], mismatched tags.
Language-Voice Mapping
| Language | lang | voice | |----------|------|-------| | Chinese | zh-CN | zh-CN-XiaoxiaoNeural | | English | en-US | en-US-AriaNeural | | Japanese | ja-JP | ja-JP-NanamiNeural | | Korean | ko-KR | ko-KR-SunHiNeural | | French | fr-FR | fr-FR-DeniseNeural | | German | de-DE | de-DE-KatjaNeural | | Spanish | es-ES | es-ES-ElviraNeural |
Part 2: ASR (Speech Recognition)
Trigger Conditions
- User explicitly requests transcription ("帮我转文字", "识别语音", etc.)
- User sends a voice message or audio attachment — receiving audio content is itself a trigger
Step 1 — Check Readiness
bash <skill_dir>/scripts/check-asr.sh
The script returns JSON with status:
ready→ ASR is fully operational (config_enabled+config_model+whisper_installedall OK). Inform the user and end the flow.partial→ Something is missing. Show the user which parts are incomplete and runinstall-asr.shto fix, then restart Gateway.not_configured→ Proceed to present options.
Step 2 — Probe Machine Resources
echo "=== CPU ===" && nproc && echo "=== MEM ===" && free -h | head -2 && echo "=== DISK ===" && df -h / | tail -1 && echo "=== GPU ===" && (nvidia-smi --query-gpu=name,memory.total --format=csv,noheader 2>/dev/null || echo "无 GPU")
Step 3 — Present Options (Wait for User Selection)
当前机器配置:[CPU]核 / [内存] / [磁盘剩余] / [GPU 信息]
A:Whisper 语音识别(开源免费,本地运行) 完全离线,隐私安全。使用 CPU 优化安装,需约 400MB。 支持 99 种语言。安装约 5~10 分钟。
B:腾讯云 ASR(付费,商用级别) 支持普通话、粤语、英语、日语等。三种模式:一句话(≤60s)、极速版(≤2h)、长音频(≤5h)。 安装约 1~2 分钟。安装后需配置腾讯云凭证,新用户有免费额度,可以从腾讯云ASR控制台领取。
请回复 A 或 B。
Step 4 — Install
User selects A — Whisper
⏳ 正在安装 Whisper 语音识别(CPU 优化版,约 400MB),大约 5~10 分钟…
Model selection: default base (~140MB) for better accuracy; use tiny (~75MB) only when disk space is critically limited.
bash <skill_dir>/scripts/install-asr.sh base
After installation, restart Gateway per "Restart Rules".
User selects B — Tencent Cloud ASR
⏳ 正在安装腾讯云 ASR 插件…
skillhub install tencentcloud-asr
Prompt user for Tencent Cloud SecretID / SecretKey, then:
openclaw config set skills.entries.tencentcloud-asr.env --strict-json '{"secret_id":"<ID>","secret_key":"<KEY>"}'
✅ 腾讯云 ASR 已配置完毕!发送语音或音频文件,我来帮你转写。
扫码联系在线客服