返回 Skill 列表
extension
分类: 数据与分析无需 API Key

Whisper Transcribe

使用 OpenAI Whisper 将音频文件转录为文本。支持自动语言检测,提供 txt、srt、vtt、json 等多种输出格式,支持批量处理和模型选择(tiny 到 large)。适用于音频录音、播客、语音消息、讲座、会议及任意音视频转写。支持 mp3、wav、m4a、ogg、flac、webm、opus、aac 格式。

person作者: josunlphubclawhub

Whisper Transcribe

Transcribe audio with scripts/transcribe.sh:

# Basic (auto-detect language, base model)
scripts/transcribe.sh recording.mp3

# German, small model, SRT subtitles
scripts/transcribe.sh --model small --language de --format srt lecture.wav

# Batch process, all formats
scripts/transcribe.sh --format all --output-dir ./transcripts/ *.mp3

# Word-level timestamps
scripts/transcribe.sh --timestamps interview.m4a

Models

| Model | RAM | Speed | Accuracy | Best for | |-------|-----|-------|----------|----------| | tiny | ~1GB | ⚡⚡⚡ | ★★ | Quick drafts, known language | | base | ~1GB | ⚡⚡ | ★★★ | General use (default) | | small | ~2GB | ⚡ | ★★★★ | Good accuracy | | medium | ~5GB | 🐢 | ★★★★★ | High accuracy | | large | ~10GB | 🐌 | ★★★★★ | Best accuracy (slow on Pi) |

Output Formats

  • txt — Plain text transcript
  • srt — SubRip subtitles (for video)
  • vtt — WebVTT subtitles
  • json — Detailed JSON with timestamps and confidence
  • all — Generate all formats at once

Requirements

  • whisper CLI (pip install openai-whisper)
  • ffmpeg (for audio decoding)
  • First run downloads the model (~150MB for base)