Feishu Voice Send
Send audio as native Feishu voice messages. Supports multi-language TTS (Chinese, English, etc.) and STT via Whisper.
Features
- 🎙️ Receive Voice: receive .ogg voice messages, transcribe to text
- 🔊 Send Voice: prefer MiniMax TTS, auto-fallback to Edge TTS when quota is insufficient
- ✅ Native Format: sent voice appears as voice bubble in Feishu (not a file)
- 🌍 Multi-language: Chinese, English, etc. via MiniMax and Edge TTS
TTS Engine Selection Logic
Send voice request
↓
Check MiniMax speech-hd quota (current_interval_total_count - usage_count)
↓
Quota > 0 → MiniMax TTS (speech-2.8-hd) ✅
Quota ≤ 0 → Edge TTS (zh-CN-XiaoxiaoNeural) ✅
Quota check: run mmx quota show --output json and look for speech_generation category remaining count.
Architecture
User voice → .ogg received → Whisper STT → understand → reply content
↓
User ← Feishu voice bubble ← Ogg/Opus convert ← MP3 TTS ← text
↑ ↑
PyAV convert MiniMax / Edge
Implementation
Sending Voice (text → Ogg/Opus)
Main entry: send_feishu_voice_unified.py
import subprocess, av, os, re, sys, json, tempfile
EDGE_TTS_SCRIPT = "/home/node/.openclaw/plugin-skills/edge-tts/scripts/tts-converter.js"
def check_minimax_quota() -> int:
result = subprocess.run(['mmx', 'quota', 'show', '--output', 'json'], capture_output=True, text=True)
data = json.loads(result.stdout)
for cat in data.get('category_remains', []):
if cat.get('category') == 'speech_generation':
return cat.get('current_interval_total_count', 0) - cat.get('current_interval_usage_count', 0)
return 0
def generate_minimax_tts(text: str) -> str:
tmp = tempfile.mktemp(suffix='.mp3')
subprocess.run(['mmx', 'speech', 'synthesize', '--text', text, '--out', tmp], check=True)
return tmp
def generate_edge_tts(text: str) -> str:
text_clean = re.sub(r'\b(TTS|语音|文字转语音|text-to-speech)\b', '', text, flags=re.IGNORECASE).strip()
result = subprocess.run(['node', EDGE_TTS_SCRIPT, text_clean, '--voice', 'zh-CN-XiaoxiaoNeural'], capture_output=True, text=True, check=True)
return re.search(r'Audio saved to: (.+)', result.stdout).group(1).strip()
def send_voice(text: str) -> str:
quota = check_minimax_quota()
mp3_path = generate_minimax_tts(text) if quota > 0 else generate_edge_tts(text)
return convert_to_ogg(mp3_path)
Format Conversion (MP3 → Ogg/Opus)
Use PyAV to convert TTS MP3 to Feishu native format:
- Container: Ogg
- Codec: libopus
- Sample rate: 16000Hz
- Channels: mono
Dependencies
| Dependency | Purpose | Install |
|------------|---------|---------|
| mmx CLI | MiniMax TTS | Installed, API Key in ~/.mmx/config.json |
| edge-tts (node) | Edge TTS fallback | Installed at /home/node/.openclaw/plugin-skills/edge-tts/ |
| PyAV | Audio format conversion | pip install av |
| Whisper | Speech recognition | pip install openai-whisper |
| soundfile | Audio file reading | pip install soundfile |
| openclaw message tool | Feishu message sending | Built into OpenClaw |
Files
| File | Description |
|------|-------------|
| send_feishu_voice_unified.py | Unified TTS sender (recommended) |
| send_feishu_voice.py | Legacy Edge TTS only version |
Limitations
- Does not support ElevenLabs or other cloud TTS (needs API Key)
- Long audio (>30s) should be segmented
- Feishu Ogg requirements: Ogg container + Opus codec + 16kHz + mono
Changelog
- 2026-05-28: Added unified version — MiniMax TTS first, auto-fallback to Edge TTS on quota exhaustion
- 2026-05-17: Initial version
Scan to join WeChat group