Whisper Transcribe

使用 OpenAI Whisper 将音频文件转录为文本。支持自动语言检测，提供 txt、srt、vtt、json 等多种输出格式，支持批量处理和模型选择（tiny 到 large）。适用于音频录音、播客、语音消息、讲座、会议及任意音视频转写。支持 mp3、wav、m4a、ogg、flac、webm、opus、aac 格式。

Whisper Transcribe

Transcribe audio with scripts/transcribe.sh:

# Basic (auto-detect language, base model)
scripts/transcribe.sh recording.mp3

# German, small model, SRT subtitles
scripts/transcribe.sh --model small --language de --format srt lecture.wav

# Batch process, all formats
scripts/transcribe.sh --format all --output-dir ./transcripts/ *.mp3

# Word-level timestamps
scripts/transcribe.sh --timestamps interview.m4a

Models

| Model | RAM | Speed | Accuracy | Best for | |-------|-----|-------|----------|----------| | tiny | ~1GB | ⚡⚡⚡ | ★★ | Quick drafts, known language | | base | ~1GB | ⚡⚡ | ★★★ | General use (default) | | small | ~2GB | ⚡ | ★★★★ | Good accuracy | | medium | ~5GB | 🐢 | ★★★★★ | High accuracy | | large | ~10GB | 🐌 | ★★★★★ | Best accuracy (slow on Pi) |

Output Formats

txt — Plain text transcript
srt — SubRip subtitles (for video)
vtt — WebVTT subtitles
json — Detailed JSON with timestamps and confidence
all — Generate all formats at once

Requirements

whisper CLI (pip install openai-whisper)
ffmpeg (for audio decoding)
First run downloads the model (~150MB for base)