speechall-cli

CLI for speech-to-text transcription via the Speechall API. Supports multiple providers (OpenAI, Deepgram, AssemblyAI, Google, Gemini, Groq, ElevenLabs, Cloudflare, and more).

Installation

Homebrew (macOS and Linux)

brew install Speechall/tap/speechall

Without Homebrew: Download the binary for your platform from https://github.com/Speechall/speechall-cli/releases and place it on your PATH.

Verify

speechall --version

Authentication

An API key is required. Provide it via environment variable (preferred) or flag:

export SPEECHALL_API_KEY="your-key-here"
# or
speechall --api-key "your-key-here" audio.wav

The user can create an API key on https://speechall.com/console/api-keys

Commands

transcribe (default)

Transcribe an audio or video file. This is the default subcommand — speechall audio.wav is equivalent to speechall transcribe audio.wav.

speechall <file> [options]

Options:

| Flag | Description | Default | |---|---|---| | --model <provider.model> | STT model identifier | openai.gpt-4o-mini-transcribe | | --language <code> | Language code (e.g. en, tr, de) | API default (auto-detect) | | --output-format <format> | Output format (text, json, verbose_json, srt, vtt) | API default | | --diarization | Enable speaker diarization | off | | --speakers-expected <n> | Expected number of speakers (use with --diarization) | — | | --no-punctuation | Disable automatic punctuation | — | | --temperature <0.0-1.0> | Model temperature | — | | --initial-prompt <text> | Text prompt to guide model style | — | | --custom-vocabulary <term> | Terms to boost recognition (repeatable) | — | | --ruleset-id <uuid> | Replacement ruleset UUID | — | | --api-key <key> | API key (overrides SPEECHALL_API_KEY env var) | — |

Examples:

# Basic transcription
speechall interview.mp3

# Specific model and language
speechall call.wav --model deepgram.nova-2 --language en

# Speaker diarization with SRT output
speechall meeting.wav --diarization --speakers-expected 3 --output-format srt

# Custom vocabulary for domain-specific terms
speechall medical.wav --custom-vocabulary "myocardial" --custom-vocabulary "infarction"

# Transcribe a video file (macOS extracts audio automatically)
speechall presentation.mp4

models

List available speech-to-text models. Outputs JSON to stdout. Filters combine with AND logic.

speechall models [options]

Filter flags:

| Flag | Description | |---|---| | --provider <name> | Filter by provider (e.g. openai, deepgram) | | --language <code> | Filter by supported language (tr matches tr, tr-TR, tr-CY) | | --diarization | Only models supporting speaker diarization | | --srt | Only models supporting SRT output | | --vtt | Only models supporting VTT output | | --punctuation | Only models supporting automatic punctuation | | --streamable | Only models supporting real-time streaming | | --vocabulary | Only models supporting custom vocabulary |