GPT Realtime Whisper

OpenAI streaming speech-to-text model for realtime transcription

Published

GPT Realtime Whisper is an OpenAI speech-to-text model for transcription, captions, voice input and audio content processing.

descriptionOverview

Overview

GPT Realtime Whisper is a speech-to-text model in the official OpenAI model catalog, with model ID gpt-realtime-whisper. Its core job is turning audio into text for transcription, captions, voice input and audio processing workflows.

Best for

Use GPT Realtime Whisper for meeting recordings, podcasts, support calls, voice input and captions. Test noisy audio, accents, multilingual content, domain terminology and long-audio stability before production.

lightbulbUse cases

Speech-to-text transcription
Meeting notes and captions
Voice input and content search
Support call and podcast processing

thumb_upStrengths

Focused on speech recognition
Useful for structured audio content
Pairs well with TTS or realtime models
Works for batch or realtime voice input

infoLimitations

Does not generate spoken output
Noise, accents and domain terms affect accuracy
Long audio requires stability and cost checks
Translation and summarization usually need downstream models

linkReferences

open_in_newhttps://platform.openai.com/docs/models