返回 Skill 列表
extension
分类: 数据与分析需要 API Key

Speechall command-line tool for fast speech-to-text transcription using multiple providers

安装并使用 speechall CLI 进行语音转文字。适用场景:① 音视频文件转写为文本;② 在 macOS 或 Linux 上安装;③ 列出可用 STT 模型及功能;④ 在终端使用说话人分离、字幕等特性。触发词:speechall、音频转写 CLI、命令行语音转文字。

person作者: atacanhubclawhub

speechall-cli

CLI for speech-to-text transcription via the Speechall API. Supports multiple providers (OpenAI, Deepgram, AssemblyAI, Google, Gemini, Groq, ElevenLabs, Cloudflare, and more).

Installation

Homebrew (macOS and Linux)

brew install Speechall/tap/speechall

Without Homebrew: Download the binary for your platform from https://github.com/Speechall/speechall-cli/releases and place it on your PATH.

Verify

speechall --version

Authentication

An API key is required. Provide it via environment variable (preferred) or flag:

export SPEECHALL_API_KEY="your-key-here"
# or
speechall --api-key "your-key-here" audio.wav

The user can create an API key on https://speechall.com/console/api-keys

Commands

transcribe (default)

Transcribe an audio or video file. This is the default subcommand — speechall audio.wav is equivalent to speechall transcribe audio.wav.

speechall <file> [options]

Options:

| Flag | Description | Default | |---|---|---| | --model <provider.model> | STT model identifier | openai.gpt-4o-mini-transcribe | | --language <code> | Language code (e.g. en, tr, de) | API default (auto-detect) | | --output-format <format> | Output format (text, json, verbose_json, srt, vtt) | API default | | --diarization | Enable speaker diarization | off | | --speakers-expected <n> | Expected number of speakers (use with --diarization) | — | | --no-punctuation | Disable automatic punctuation | — | | --temperature <0.0-1.0> | Model temperature | — | | --initial-prompt <text> | Text prompt to guide model style | — | | --custom-vocabulary <term> | Terms to boost recognition (repeatable) | — | | --ruleset-id <uuid> | Replacement ruleset UUID | — | | --api-key <key> | API key (overrides SPEECHALL_API_KEY env var) | — |

Examples:

# Basic transcription
speechall interview.mp3

# Specific model and language
speechall call.wav --model deepgram.nova-2 --language en

# Speaker diarization with SRT output
speechall meeting.wav --diarization --speakers-expected 3 --output-format srt

# Custom vocabulary for domain-specific terms
speechall medical.wav --custom-vocabulary "myocardial" --custom-vocabulary "infarction"

# Transcribe a video file (macOS extracts audio automatically)
speechall presentation.mp4

models

List available speech-to-text models. Outputs JSON to stdout. Filters combine with AND logic.

speechall models [options]

Filter flags:

| Flag | Description | |---|---| | --provider <name> | Filter by provider (e.g. openai, deepgram) | | --language <code> | Filter by supported language (tr matches tr, tr-TR, tr-CY) | | --diarization | Only models supporting speaker diarization | | --srt | Only models supporting SRT output | | --vtt | Only models supporting VTT output | | --punctuation | Only models supporting automatic punctuation | | --streamable | Only models supporting real-time streaming | | --vocabulary | Only models supporting custom vocabulary |

Examples:

# List all available models
speechall models

# Models from a specific provider
speechall models --provider deepgram

# Models that support Turkish and diarization
speechall models --language tr --diarization

# Pipe to jq for specific fields
speechall models --provider openai | jq '.[].identifier'

Tips

  • On macOS, video files (.mp4, .mov, etc.) are automatically converted to audio before upload.
  • On Linux, pass audio files directly (.wav, .mp3, .m4a, .flac, etc.).
  • Output goes to stdout. Redirect to save: speechall audio.wav > transcript.txt
  • Errors go to stderr, so piping stdout is safe.
  • Run speechall --help, speechall transcribe --help, or speechall models --help to see all valid enum values for model identifiers, language codes, and output formats.