GPT-4o

OpenAI flagship multimodal model for text, vision and real-time interaction

Published

scheduleReleased：May 13, 2024

GPT-4o is OpenAI's general-purpose flagship multimodal model. It is suitable for products that need strong text understanding, image analysis, code assistance, tool calling and stable conversation quality at scale.

starsCapabilities

visibilityVision understandingcodeFunction callingdata_objectStructured output

paymentsContext and pricing

Context limit128,000

Max output16,384

Knowledge cutoff2023-09

Input price$2.5/ 1M tokens

Output price$10/ 1M tokens

Cached input price$1.25/ 1M tokens

descriptionOverview

Overview

GPT-4o is a balanced default choice for many production AI features. It combines strong language understanding, multimodal input and mature API ecosystem support.

Best for

Use GPT-4o when you need a reliable general model for assistants, content workflows, document understanding, image Q&A or tool-assisted automation.

lightbulbUse cases

Customer support assistants
Code generation and review
Image understanding and multimodal Q&A
Document summarization and extraction

thumb_upStrengths

Mature multimodal capability
Strong tool calling ecosystem
Stable general task performance
Good default production model

infoLimitations

More expensive than lightweight models
Deep reasoning tasks may require a dedicated reasoning model
Needs product-side safety controls

compare_arrowsAlternative models

Claude Sonnet 4.5Claude Sonnet model optimized for coding, agents and complex workflows

Gemini 2.5 ProGoogle advanced multimodal model for long-context and reasoning-heavy tasks

Qwen MaxQwen high-capability model for advanced Chinese and coding tasks

linkReferences

open_in_newhttps://platform.openai.com/docs/models/gpt-4o