AI Model Encyclopedia

auto_storiesModel Wiki

Explore major large language models with capability notes, technical parameters, use cases and trade-offs

psychology
In-depth model profilesEach entry summarizes capabilities, ideal use cases, limitations and reference sources

domainGoogle

Googlegemini-flash

Gemini 3.5 Flash

Gemini 3.5 Flash is a gemini-flash model from Google for assistants, generation and automation tasks

Gemini 3.5 Flash is a large language model from Google. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

Released May 2026
Googlegemini-flash-lite

Gemini 3.1 Flash Lite

Gemini 3.1 Flash Lite is a gemini-flash-lite model from Google for assistants, generation and automation tasks

Gemini 3.1 Flash Lite is a large language model from Google, with an approximate context window of 1,048,576 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

1049K contextMultimodalInput $0.25/1M tokensOutput $1.5/1M tokensReleased May 2026
Googlegemini-flash-lite

Gemini 3.1 Flash Lite Preview

Gemini 3.1 Flash Lite Preview is a gemini-flash-lite model from Google for assistants, generation and automation tasks

Gemini 3.1 Flash Lite Preview is a large language model from Google, with an approximate context window of 1,048,576 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

1049K contextMultimodalInput $0.25/1M tokensOutput $1.5/1M tokensReleased March 2026
Googlegemini-flash

Nano Banana 2

Nano Banana 2 is a gemini-flash model from Google for assistants, generation and automation tasks

Nano Banana 2 is a large language model from Google, with an approximate context window of 131,072 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

66K contextMultimodalInput $0.5/1M tokensOutput $60/1M tokensReleased February 2026
Googlegemini-pro

Gemini 3.1 Pro Preview Custom Tools

Gemini 3.1 Pro Preview Custom Tools is a gemini-pro model from Google for assistants, generation and automation tasks

Gemini 3.1 Pro Preview Custom Tools is a large language model from Google, with an approximate context window of 1,048,576 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

1049K contextMultimodalInput $2/1M tokensOutput $12/1M tokensReleased February 2026
Googlegemini-pro

Gemini 3.1 Pro Preview

Gemini 3.1 Pro Preview is a gemini-pro model from Google for assistants, generation and automation tasks

Gemini 3.1 Pro Preview is a large language model from Google, with an approximate context window of 1,048,576 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

1049K contextMultimodalInput $2/1M tokensOutput $12/1M tokensReleased February 2026
Googlegemini-flash

Gemini 3 Flash Preview

Gemini 3 Flash Preview is a gemini-flash model from Google for assistants, generation and automation tasks

Gemini 3 Flash Preview is a large language model from Google, with an approximate context window of 1,048,576 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

1049K contextMultimodalInput $0.5/1M tokensOutput $3/1M tokensReleased December 2025
Googlegemini-pro

Gemini 3 Pro Preview

Gemini 3 Pro Preview is a gemini-pro model from Google for assistants, generation and automation tasks

Gemini 3 Pro Preview is a large language model from Google, with an approximate context window of 1,000,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

1049K contextMultimodalInput $2/1M tokensOutput $12/1M tokensReleased November 2025
Googlegemini-flash

Gemini Flash Latest

Gemini Flash Latest is a gemini-flash model from Google for assistants, generation and automation tasks

Gemini Flash Latest is a large language model from Google, with an approximate context window of 1,048,576 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

1049K contextMultimodalInput $0.3/1M tokensOutput $2.5/1M tokensReleased September 2025
Googlegemini-flash-lite

Gemini Flash-Lite Latest

Gemini Flash-Lite Latest is a gemini-flash-lite model from Google for assistants, generation and automation tasks

Gemini Flash-Lite Latest is a large language model from Google, with an approximate context window of 1,048,576 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

1049K contextMultimodalInput $0.1/1M tokensOutput $0.4/1M tokensReleased September 2025
Googlegemini-flash

Nano Banana

Nano Banana is a gemini-flash model from Google for assistants, generation and automation tasks

Nano Banana is a large language model from Google, with an approximate context window of 32,768 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

33K contextMultimodalInput $0.3/1M tokensOutput $30/1M tokensReleased August 2025
Googlegemini-flash-lite

Gemini 2.5 Flash-Lite

Gemini 2.5 Flash-Lite is a gemini-flash-lite model from Google for assistants, generation and automation tasks

Gemini 2.5 Flash-Lite is a large language model from Google, with an approximate context window of 1,048,576 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

1049K contextMultimodalInput $0.1/1M tokensOutput $0.4/1M tokensReleased June 2025
Googlegemini

Gemini Embedding 001

Gemini Embedding 001 is a gemini model from Google for assistants, generation and automation tasks

Gemini Embedding 001 is a large language model from Google, with an approximate context window of 2,048 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

2K contextInput $0.15/1M tokensOutput $0/1M tokensReleased May 2025
Googlegemini-flash

Gemini 2.5 Pro Preview TTS

Gemini 2.5 Pro Preview TTS is a gemini-flash model from Google for assistants, generation and automation tasks

Gemini 2.5 Pro Preview TTS is a large language model from Google, with an approximate context window of 8,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

8K contextInput $1/1M tokensOutput $20/1M tokensReleased May 2025
Googlegemini-flash

Gemini 2.5 Flash Preview TTS

Gemini 2.5 Flash Preview TTS is a gemini-flash model from Google for assistants, generation and automation tasks

Gemini 2.5 Flash Preview TTS is a large language model from Google, with an approximate context window of 8,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

8K contextInput $0.5/1M tokensOutput $10/1M tokensReleased May 2025
Googlegemini-pro

Gemini 2.5 Pro

Gemini 2.5 Pro is a gemini-pro model from Google for assistants, generation and automation tasks

Gemini 2.5 Pro is a large language model from Google, with an approximate context window of 1,048,576 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

1049K contextMultimodalInput $1.25/1M tokensOutput $10/1M tokensReleased March 2025
Googlegemini-flash

Gemini 2.5 Flash

Gemini 2.5 Flash is a gemini-flash model from Google for assistants, generation and automation tasks

Gemini 2.5 Flash is a large language model from Google, with an approximate context window of 1,048,576 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

1049K contextMultimodalInput $0.3/1M tokensOutput $2.5/1M tokensReleased March 2025
Googlegemini-pro

Gemini 2.5 Pro

Google advanced multimodal model for long-context and reasoning-heavy tasks

Gemini 2.5 Pro is Google's high-capability Gemini model, useful for complex reasoning, long-context analysis and multimodal applications across text and media inputs.

1049K contextMultimodalInput $1.25/1M tokensOutput $10/1M tokensReleased March 2025
Googlegemini-flash

Gemini 2.5 Flash

Fast Gemini model for low-latency multimodal and high-throughput tasks

Gemini 2.5 Flash is optimized for speed and efficiency, making it suitable for interactive products, lightweight reasoning and high-volume calls.

1049K contextMultimodalInput $0.3/1M tokensOutput $2.5/1M tokensReleased March 2025
Googlegemini-flash-lite

Gemini 2.0 Flash-Lite

Gemini 2.0 Flash-Lite is a gemini-flash-lite model from Google for assistants, generation and automation tasks

Gemini 2.0 Flash-Lite is a large language model from Google, with an approximate context window of 1,048,576 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

1049K contextMultimodalInput $0.075/1M tokensOutput $0.3/1M tokensReleased December 2024
Googlegemini-flash

Gemini 2.0 Flash

Gemini 2.0 Flash is a gemini-flash model from Google for assistants, generation and automation tasks

Gemini 2.0 Flash is a large language model from Google, with an approximate context window of 1,048,576 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

1049K contextMultimodalInput $0.1/1M tokensOutput $0.4/1M tokensReleased December 2024

domainDeepSeek

DeepSeekdeepseek-thinking

DeepSeek V4 Pro

DeepSeek V4 Pro is a deepseek-thinking model from DeepSeek for assistants, generation and automation tasks

DeepSeek V4 Pro is a large language model from DeepSeek, with an approximate context window of 1,000,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

1000K contextInput ¥3/1M tokensOutput ¥6/1M tokensReleased April 2026
DeepSeekdeepseek-flash

DeepSeek V4 Flash

DeepSeek V4 Flash is a deepseek-flash model from DeepSeek for assistants, generation and automation tasks

DeepSeek V4 Flash is a large language model from DeepSeek, with an approximate context window of 1,000,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

1000K contextInput ¥1/1M tokensOutput ¥2/1M tokensReleased April 2026
DeepSeekdeepseek-thinking

DeepSeek Reasoner

DeepSeek Reasoner is a deepseek-thinking model from DeepSeek for assistants, generation and automation tasks

DeepSeek Reasoner is a large language model from DeepSeek, with an approximate context window of 1,000,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

1000K contextInput $0.14/1M tokensOutput $0.28/1M tokensReleased December 2025
DeepSeekdeepseek

DeepSeek Chat

DeepSeek Chat model profile for capabilities, pricing and use cases

DeepSeek Chat is a large language model from DeepSeek. This entry summarizes its positioning, typical use cases, strengths, limitations and related pricing signals for quick comparison.

1000K contextInput $0.14/1M tokensOutput $0.28/1M tokensReleased December 2025

domainOpenAI

OpenAIgpt-pro

GPT-5.5 Pro

GPT-5.5 Pro is a gpt-pro model from OpenAI for assistants, generation and automation tasks

GPT-5.5 Pro is a large language model from OpenAI, with an approximate context window of 1,050,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

1050K contextMultimodalInput $30/1M tokensOutput $180/1M tokensReleased April 2026
OpenAIgpt-pro

GPT-5.4 Pro

GPT-5.4 Pro is a gpt-pro model from OpenAI for assistants, generation and automation tasks

GPT-5.4 Pro is a large language model from OpenAI, with an approximate context window of 1,050,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

1050K contextMultimodalInput $30/1M tokensOutput $180/1M tokensReleased March 2026
OpenAIgpt

GPT-5.3 Chat (latest)

GPT-5.3 Chat (latest) is a gpt model from OpenAI for assistants, generation and automation tasks

GPT-5.3 Chat (latest) is a large language model from OpenAI, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextMultimodalInput $1.75/1M tokensOutput $14/1M tokensReleased March 2026
OpenAIgpt-codex-spark

GPT-5.3 Codex Spark

GPT-5.3 Codex Spark is a gpt-codex-spark model from OpenAI for assistants, generation and automation tasks

GPT-5.3 Codex Spark is a large language model from OpenAI, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextMultimodalInput $1.75/1M tokensOutput $14/1M tokensReleased February 2026
OpenAIgpt-codex

GPT-5.3 Codex

GPT-5.3 Codex is a gpt-codex model from OpenAI for assistants, generation and automation tasks

GPT-5.3 Codex is a large language model from OpenAI, with an approximate context window of 400,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

400K contextMultimodalInput $1.75/1M tokensOutput $14/1M tokensReleased February 2026
OpenAIgpt

GPT-5.2

GPT-5.2 is a gpt model from OpenAI for assistants, generation and automation tasks

GPT-5.2 is a large language model from OpenAI, with an approximate context window of 400,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

400K contextMultimodalInput $1.75/1M tokensOutput $14/1M tokensReleased December 2025
OpenAIgpt-pro

GPT-5.2 Pro

GPT-5.2 Pro is a gpt-pro model from OpenAI for assistants, generation and automation tasks

GPT-5.2 Pro is a large language model from OpenAI, with an approximate context window of 400,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

400K contextMultimodalInput $21/1M tokensOutput $168/1M tokensReleased December 2025
OpenAIgpt-codex

GPT-5.2 Chat

GPT-5.2 Chat is a gpt-codex model from OpenAI for assistants, generation and automation tasks

GPT-5.2 Chat is a large language model from OpenAI, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextMultimodalInput $1.75/1M tokensOutput $14/1M tokensReleased December 2025
OpenAIgpt-codex

GPT-5.2 Codex

GPT-5.2 Codex is a gpt-codex model from OpenAI for assistants, generation and automation tasks

GPT-5.2 Codex is a large language model from OpenAI, with an approximate context window of 400,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

400K contextMultimodalInput $1.75/1M tokensOutput $14/1M tokensReleased December 2025
OpenAIgpt-codex

GPT-5.1 Codex mini

GPT-5.1 Codex mini is a gpt-codex model from OpenAI for assistants, generation and automation tasks

GPT-5.1 Codex mini is a large language model from OpenAI, with an approximate context window of 400,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

400K contextMultimodalInput $0.25/1M tokensOutput $2/1M tokensReleased November 2025
OpenAIgpt-codex

GPT-5.1 Chat

GPT-5.1 Chat is a gpt-codex model from OpenAI for assistants, generation and automation tasks

GPT-5.1 Chat is a large language model from OpenAI, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextMultimodalInput $1.25/1M tokensOutput $10/1M tokensReleased November 2025
OpenAIgpt

GPT-5.1

GPT-5.1 is a gpt model from OpenAI for assistants, generation and automation tasks

GPT-5.1 is a large language model from OpenAI, with an approximate context window of 400,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

400K contextMultimodalInput $1.25/1M tokensOutput $10/1M tokensReleased November 2025
OpenAIgpt-codex

GPT-5.1 Codex Max

GPT-5.1 Codex Max is a gpt-codex model from OpenAI for assistants, generation and automation tasks

GPT-5.1 Codex Max is a large language model from OpenAI, with an approximate context window of 400,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

400K contextMultimodalInput $1.25/1M tokensOutput $10/1M tokensReleased November 2025
OpenAIgpt-codex

GPT-5.1 Codex

GPT-5.1 Codex is a gpt-codex model from OpenAI for assistants, generation and automation tasks

GPT-5.1 Codex is a large language model from OpenAI, with an approximate context window of 400,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

400K contextMultimodalInput $1.25/1M tokensOutput $10/1M tokensReleased November 2025
OpenAIgpt-pro

GPT-5 Pro

GPT-5 Pro is a gpt-pro model from OpenAI for assistants, generation and automation tasks

GPT-5 Pro is a large language model from OpenAI, with an approximate context window of 400,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

400K contextMultimodalInput $15/1M tokensOutput $120/1M tokensReleased October 2025
OpenAIgpt-codex

GPT-5-Codex

GPT-5-Codex is a gpt-codex model from OpenAI for assistants, generation and automation tasks

GPT-5-Codex is a large language model from OpenAI, with an approximate context window of 400,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

400K contextMultimodalInput $1.25/1M tokensOutput $10/1M tokensReleased September 2025
OpenAIgpt-codex

GPT-5 Chat (latest)

GPT-5 Chat (latest) is a gpt-codex model from OpenAI for assistants, generation and automation tasks

GPT-5 Chat (latest) is a large language model from OpenAI, with an approximate context window of 400,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

400K contextMultimodalInput $1.25/1M tokensOutput $10/1M tokensReleased August 2025
OpenAIgpt-nano

GPT-5 Nano

GPT-5 Nano is a gpt-nano model from OpenAI for assistants, generation and automation tasks

GPT-5 Nano is a large language model from OpenAI, with an approximate context window of 400,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

400K contextMultimodalInput $0.05/1M tokensOutput $0.4/1M tokensReleased August 2025
OpenAIgpt

GPT-5

GPT-5 is a gpt model from OpenAI for assistants, generation and automation tasks

GPT-5 is a large language model from OpenAI, with an approximate context window of 400,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

400K contextMultimodalInput $1.25/1M tokensOutput $10/1M tokensReleased August 2025
OpenAIgpt-mini

GPT-5 Mini

GPT-5 Mini model profile for capabilities, pricing and use cases

GPT-5 Mini is a large language model from OpenAI. This entry summarizes its positioning, typical use cases, strengths, limitations and related pricing signals for quick comparison.

400K contextMultimodalInput $0.25/1M tokensOutput $2/1M tokensReleased August 2025
OpenAIo-pro

o3-pro

o3-pro is a o-pro model from OpenAI for assistants, generation and automation tasks

o3-pro is a large language model from OpenAI, with an approximate context window of 200,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

200K contextMultimodalInput $20/1M tokensOutput $80/1M tokensReleased June 2025
OpenAIo-mini

o4-mini

o4-mini is a o-mini model from OpenAI for assistants, generation and automation tasks

o4-mini is a large language model from OpenAI, with an approximate context window of 200,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

200K contextMultimodalInput $1.1/1M tokensOutput $4.4/1M tokensReleased April 2025
OpenAIo

o3

o3 is a o model from OpenAI for assistants, generation and automation tasks

o3 is a large language model from OpenAI, with an approximate context window of 200,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

200K contextMultimodalInput $2/1M tokensOutput $8/1M tokensReleased April 2025
OpenAIgpt

GPT-4.1

GPT-4.1 is a gpt model from OpenAI for assistants, generation and automation tasks

GPT-4.1 is a large language model from OpenAI, with an approximate context window of 1,047,576 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

1048K contextMultimodalInput $2/1M tokensOutput $8/1M tokensReleased April 2025
OpenAIgpt-mini

GPT-4.1 mini

GPT-4.1 mini is a gpt-mini model from OpenAI for assistants, generation and automation tasks

GPT-4.1 mini is a large language model from OpenAI, with an approximate context window of 1,047,576 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

1048K contextMultimodalInput $0.4/1M tokensOutput $1.6/1M tokensReleased April 2025
OpenAIgpt-nano

GPT-4.1 nano

GPT-4.1 nano is a gpt-nano model from OpenAI for assistants, generation and automation tasks

GPT-4.1 nano is a large language model from OpenAI, with an approximate context window of 1,047,576 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

1048K contextMultimodalInput $0.1/1M tokensOutput $0.4/1M tokensReleased April 2025
OpenAIo-pro

o1-pro

o1-pro is a o-pro model from OpenAI for assistants, generation and automation tasks

o1-pro is a large language model from OpenAI, with an approximate context window of 200,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

200K contextMultimodalInput $150/1M tokensOutput $600/1M tokensReleased March 2025
OpenAIo-mini

o3-mini

o3-mini model profile for capabilities, pricing and use cases

o3-mini is a large language model from OpenAI. This entry summarizes its positioning, typical use cases, strengths, limitations and related pricing signals for quick comparison.

200K contextInput $1.1/1M tokensOutput $4.4/1M tokensReleased December 2024
OpenAIo

o1

o1 model profile for capabilities, pricing and use cases

o1 is a large language model from OpenAI. This entry summarizes its positioning, typical use cases, strengths, limitations and related pricing signals for quick comparison.

200K contextMultimodalInput $15/1M tokensOutput $60/1M tokensReleased December 2024
OpenAIgpt

GPT-4o (2024-11-20)

GPT-4o (2024-11-20) is a gpt model from OpenAI for assistants, generation and automation tasks

GPT-4o (2024-11-20) is a large language model from OpenAI, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextMultimodalInput $2.5/1M tokensOutput $10/1M tokensReleased November 2024
OpenAIo

o1-preview

o1-preview is a o model from OpenAI for assistants, generation and automation tasks

o1-preview is a large language model from OpenAI, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextInput $15/1M tokensOutput $60/1M tokensReleased September 2024
OpenAIo-mini

o1-mini

o1-mini model profile for capabilities, pricing and use cases

o1-mini is a large language model from OpenAI. This entry summarizes its positioning, typical use cases, strengths, limitations and related pricing signals for quick comparison.

128K contextInput $1.1/1M tokensOutput $4.4/1M tokensReleased September 2024
OpenAIgpt

GPT-4o (2024-08-06)

GPT-4o (2024-08-06) is a gpt model from OpenAI for assistants, generation and automation tasks

GPT-4o (2024-08-06) is a large language model from OpenAI, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextMultimodalInput $2.5/1M tokensOutput $10/1M tokensReleased August 2024
OpenAIgpt-mini

GPT-4o mini

Compact OpenAI multimodal model for high-volume, cost-sensitive workloads

GPT-4o mini is a lightweight OpenAI model designed for lower-cost, high-throughput applications while keeping useful multimodal and tool-assisted capabilities.

128K contextMultimodalInput $0.15/1M tokensOutput $0.6/1M tokensReleased July 2024
OpenAIo-mini

o4-mini-deep-research

o4-mini-deep-research is a o-mini model from OpenAI for assistants, generation and automation tasks

o4-mini-deep-research is a large language model from OpenAI, with an approximate context window of 200,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

200K contextMultimodalInput $2/1M tokensOutput $8/1M tokensReleased June 2024
OpenAIo

o3-deep-research

o3-deep-research is a o model from OpenAI for assistants, generation and automation tasks

o3-deep-research is a large language model from OpenAI, with an approximate context window of 200,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

200K contextMultimodalInput $10/1M tokensOutput $40/1M tokensReleased June 2024
OpenAIgpt

GPT-4o (2024-05-13)

GPT-4o (2024-05-13) is a gpt model from OpenAI for assistants, generation and automation tasks

GPT-4o (2024-05-13) is a large language model from OpenAI, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextMultimodalInput $5/1M tokensOutput $15/1M tokensReleased May 2024
OpenAIgpt

GPT-4o

OpenAI flagship multimodal model for text, vision and real-time interaction

GPT-4o is OpenAI's general-purpose flagship multimodal model. It is suitable for products that need strong text understanding, image analysis, code assistance, tool calling and stable conversation quality at scale.

128K contextMultimodalInput $2.5/1M tokensOutput $10/1M tokensReleased May 2024
OpenAItext-embedding

text-embedding-3-large

text-embedding-3-large is a text-embedding model from OpenAI for assistants, generation and automation tasks

text-embedding-3-large is a large language model from OpenAI, with an approximate context window of 8,191 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

8K contextInput $0.13/1M tokensOutput $0/1M tokensReleased January 2024
OpenAItext-embedding

text-embedding-3-small

text-embedding-3-small is a text-embedding model from OpenAI for assistants, generation and automation tasks

text-embedding-3-small is a large language model from OpenAI, with an approximate context window of 8,191 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

8K contextInput $0.02/1M tokensOutput $0/1M tokensReleased January 2024
OpenAIgpt

GPT-4 Turbo

GPT-4 Turbo is a gpt model from OpenAI for assistants, generation and automation tasks

GPT-4 Turbo is a large language model from OpenAI, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextMultimodalInput $10/1M tokensOutput $30/1M tokensReleased November 2023
OpenAIgpt

GPT-4

GPT-4 is a gpt model from OpenAI for assistants, generation and automation tasks

GPT-4 is a large language model from OpenAI, with an approximate context window of 8,192 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

8K contextInput $30/1M tokensOutput $60/1M tokensReleased November 2023
OpenAIgpt

GPT-3.5-turbo

GPT-3.5-turbo is a gpt model from OpenAI for assistants, generation and automation tasks

GPT-3.5-turbo is a large language model from OpenAI, with an approximate context window of 16,385 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

16K contextInput $0.5/1M tokensOutput $1.5/1M tokensReleased March 2023
OpenAItext-embedding

text-embedding-ada-002

text-embedding-ada-002 is a text-embedding model from OpenAI for assistants, generation and automation tasks

text-embedding-ada-002 is a large language model from OpenAI, with an approximate context window of 8,192 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

8K contextInput $0.1/1M tokensOutput $0/1M tokensReleased December 2022
OpenAIgpt-mini

GPT-5.4 nano

Tiny GPT-5.4 variant for very low-cost and high-throughput tasks

GPT-5.4 nano is an OpenAI general-purpose model for batch tagging, simple extraction, routing, preprocessing and extremely cost-sensitive workloads.

400K contextMultimodalInput $0.2/1M tokensOutput $1.25/1M tokens
OpenAIgpt

GPT-5.4

High-capability OpenAI general model balancing quality and cost

GPT-5.4 is an OpenAI general-purpose model for product assistants, content generation, coding assistance, structured processing and multimodal understanding.

1050K contextMultimodalInput $2.5/1M tokensOutput $15/1M tokens
OpenAIgpt-mini

GPT-5.4 mini

Lower-latency and lower-cost GPT-5.4 variant

GPT-5.4 mini is an OpenAI general-purpose model for high-volume support, summarization, rewriting, lightweight classification and cost-sensitive automation.

400K contextMultimodalInput $0.75/1M tokensOutput $4.5/1M tokens
OpenAIgpt

GPT-5.5

OpenAI flagship general-purpose model for complex reasoning, coding and high-quality generation

GPT-5.5 is an OpenAI general-purpose model for hard Q&A, complex coding, long-form analysis and agent workflows.

1050K contextMultimodalInput $5/1M tokensOutput $30/1M tokens
OpenAIgpt-image

GPT Image 2

OpenAI image generation and editing model for high-quality visual creation

GPT Image 2 is an OpenAI image model for text-to-image generation, image editing, creative design and visual content workflows.

OpenAIrealtime

GPT Realtime 2

Reasoning-focused realtime voice model for low-latency audio interactions

GPT Realtime 2 is an OpenAI realtime model for low-latency voice input, voice output and interactive conversational experiences.

OpenAIrealtime-mini

GPT Realtime mini

Cost-efficient OpenAI realtime model for voice applications

GPT Realtime mini is an OpenAI realtime model for low-latency voice input, voice output and interactive conversational experiences.

OpenAIrealtime

GPT Realtime 1.5

OpenAI realtime voice model for audio input and audio output

GPT Realtime 1.5 is an OpenAI realtime model for low-latency voice input, voice output and interactive conversational experiences.

OpenAIrealtime

GPT Realtime Translate

OpenAI realtime model for streaming speech-to-speech translation

GPT Realtime Translate focuses on low-latency speech translation for cross-language calls, meetings and voice products.

OpenAIrealtime

GPT Realtime Whisper

OpenAI streaming speech-to-text model for realtime transcription

GPT Realtime Whisper is an OpenAI speech-to-text model for transcription, captions, voice input and audio content processing.

OpenAItranscribe

GPT-4o Transcribe

Speech-to-text model powered by GPT-4o

GPT-4o Transcribe is an OpenAI speech-to-text model for transcription, captions, voice input and audio content processing.

OpenAItranscribe

GPT-4o mini Transcribe

Cost-efficient speech-to-text model powered by GPT-4o mini

GPT-4o mini Transcribe is an OpenAI speech-to-text model for transcription, captions, voice input and audio content processing.

domainMoonshot (Kimi)

Moonshot (Kimi)kimi-k2.6

Kimi K2.6

Kimi K2.6 is a kimi-k2.6 model from Moonshot AI for assistants, generation and automation tasks

Kimi K2.6 is a large language model from Moonshot AI, with an approximate context window of 262,144 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

262K contextMultimodalInput $0.95/1M tokensOutput $4/1M tokensReleased April 2026
Moonshot (Kimi)kimi-k2.5

Kimi K2.5

Kimi K2.5 is a kimi-k2.5 model from Moonshot AI for assistants, generation and automation tasks

Kimi K2.5 is a large language model from Moonshot AI, with an approximate context window of 262,144 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

262K contextMultimodalInput $0.6/1M tokensOutput $3/1M tokensReleased January 2026
Moonshot (Kimi)kimi-thinking

Kimi K2 Thinking

Kimi K2 Thinking is a kimi-thinking model from Moonshot AI for assistants, generation and automation tasks

Kimi K2 Thinking is a large language model from Moonshot AI, with an approximate context window of 262,144 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

262K contextInput $0.6/1M tokensOutput $2.5/1M tokensReleased November 2025
Moonshot (Kimi)kimi-thinking

Kimi K2 Thinking Turbo

Kimi K2 Thinking Turbo is a kimi-thinking model from Moonshot AI for assistants, generation and automation tasks

Kimi K2 Thinking Turbo is a large language model from Moonshot AI, with an approximate context window of 262,144 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

262K contextInput $1.15/1M tokensOutput $8/1M tokensReleased November 2025
Moonshot (Kimi)kimi

Kimi K2 0905

Kimi K2 0905 is a kimi model from Moonshot AI for assistants, generation and automation tasks

Kimi K2 0905 is a large language model from Moonshot AI, with an approximate context window of 262,144 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

262K contextInput $0.6/1M tokensOutput $2.5/1M tokensReleased September 2025
Moonshot (Kimi)kimi

Kimi K2 Turbo

Kimi K2 Turbo is a kimi model from Moonshot AI for assistants, generation and automation tasks

Kimi K2 Turbo is a large language model from Moonshot AI, with an approximate context window of 262,144 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

262K contextInput $2.4/1M tokensOutput $10/1M tokensReleased September 2025
Moonshot (Kimi)kimi

Kimi K2 0711

Kimi K2 0711 is a kimi model from Moonshot AI for assistants, generation and automation tasks

Kimi K2 0711 is a large language model from Moonshot AI, with an approximate context window of 131,072 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

131K contextInput $0.6/1M tokensOutput $2.5/1M tokensReleased July 2025
Moonshot (Kimi)kimi

Moonshot v1 128K

Moonshot v1 128K for very long-document analysis, retrieval-augmented reading and complex context workflows

Moonshot v1 128K is a Moonshot/Kimi model for very long-document analysis, retrieval-augmented reading and complex context workflows, often evaluated for Chinese document and knowledge workflows.

Moonshot (Kimi)kimi-k2

Kimi K2

Kimi K2 for complex reasoning, coding, agent workflows and higher-value production tasks

Kimi K2 is a Moonshot/Kimi model for complex reasoning, coding, agent workflows and higher-value production tasks, often evaluated for Chinese document and knowledge workflows.

domainAnthropic

Anthropicclaude-opus

Claude Opus 4.7

Claude Opus 4.7 is a claude-opus model from Anthropic for assistants, generation and automation tasks

Claude Opus 4.7 is a large language model from Anthropic, with an approximate context window of 1,000,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

1000K contextMultimodalInput $5/1M tokensOutput $25/1M tokensReleased April 2026
Anthropicclaude-sonnet

Claude Sonnet 4.6

Claude Sonnet 4.6 is a claude-sonnet model from Anthropic for assistants, generation and automation tasks

Claude Sonnet 4.6 is a large language model from Anthropic, with an approximate context window of 1,000,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

1000K contextMultimodalInput $3/1M tokensOutput $15/1M tokensReleased February 2026
Anthropicclaude-opus

Claude Opus 4.6

Claude Opus 4.6 is a claude-opus model from Anthropic for assistants, generation and automation tasks

Claude Opus 4.6 is a large language model from Anthropic, with an approximate context window of 1,000,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

1000K contextMultimodalInput $5/1M tokensOutput $25/1M tokensReleased February 2026
Anthropicclaude-opus

Claude Opus 4.5 (latest)

Claude Opus 4.5 (latest) is a claude-opus model from Anthropic for assistants, generation and automation tasks

Claude Opus 4.5 (latest) is a large language model from Anthropic, with an approximate context window of 200,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

200K contextMultimodalInput $5/1M tokensOutput $25/1M tokensReleased November 2025
Anthropicclaude-haiku

Claude Haiku 4.5

Claude Haiku 4.5 (latest) is a claude-haiku model from Anthropic for assistants, generation and automation tasks

Claude Haiku 4.5 (latest) is a large language model from Anthropic, with an approximate context window of 200,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

200K contextMultimodalInput $1/1M tokensOutput $5/1M tokensReleased October 2025
Anthropicclaude

Claude Sonnet 4.5

Claude Sonnet model optimized for coding, agents and complex workflows

Claude Sonnet 4.5 is positioned as a high-quality model for coding, long-form reasoning, agent workflows and structured professional writing. It is a strong option when reliability and instruction following matter.

200K contextMultimodalInput $3/1M tokensOutput $15/1M tokensReleased September 2025
Anthropicclaude-opus

Claude Opus 4.1 (latest)

Claude Opus 4.1 (latest) is a claude-opus model from Anthropic for assistants, generation and automation tasks

Claude Opus 4.1 (latest) is a large language model from Anthropic, with an approximate context window of 200,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

200K contextMultimodalInput $15/1M tokensOutput $75/1M tokensReleased August 2025
Anthropicclaude-opus

Claude Opus 4 (latest)

Claude Opus 4 (latest) is a claude-opus model from Anthropic for assistants, generation and automation tasks

Claude Opus 4 (latest) is a large language model from Anthropic, with an approximate context window of 200,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

200K contextMultimodalInput $15/1M tokensOutput $75/1M tokensReleased May 2025
Anthropicclaude-sonnet

Claude Sonnet 4 (latest)

Claude Sonnet 4 (latest) is a claude-sonnet model from Anthropic for assistants, generation and automation tasks

Claude Sonnet 4 (latest) is a large language model from Anthropic, with an approximate context window of 200,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

200K contextMultimodalInput $3/1M tokensOutput $15/1M tokensReleased May 2025
Anthropicclaude

Claude Sonnet 4

Claude Sonnet 4 model profile for capabilities, pricing and use cases

Claude Sonnet 4 is a large language model from Anthropic. This entry summarizes its positioning, typical use cases, strengths, limitations and related pricing signals for quick comparison.

Released May 2025
Anthropicclaude

Claude Opus 4

High-end Claude model for difficult reasoning, coding and long-running work

Claude Opus 4 is positioned for demanding tasks that need stronger reasoning, deeper code understanding and careful execution over longer workflows.

Released May 2025
Anthropicclaude-sonnet

Claude Sonnet 3.7

Claude Sonnet 3.7 is a claude-sonnet model from Anthropic for assistants, generation and automation tasks

Claude Sonnet 3.7 is a large language model from Anthropic, with an approximate context window of 200,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

200K contextMultimodalInput $3/1M tokensOutput $15/1M tokensReleased February 2025
Anthropicclaude-haiku

Claude Haiku 3.5 (latest)

Claude Haiku 3.5 (latest) is a claude-haiku model from Anthropic for assistants, generation and automation tasks

Claude Haiku 3.5 (latest) is a large language model from Anthropic, with an approximate context window of 200,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

200K contextMultimodalInput $0.8/1M tokensOutput $4/1M tokensReleased October 2024
Anthropicclaude-haiku

Claude Haiku 3.5

Claude Haiku 3.5 is a claude-haiku model from Anthropic for assistants, generation and automation tasks

Claude Haiku 3.5 is a large language model from Anthropic, with an approximate context window of 200,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

200K contextMultimodalInput $0.8/1M tokensOutput $4/1M tokensReleased October 2024
Anthropicclaude-sonnet

Claude Sonnet 3.5 v2

Claude Sonnet 3.5 v2 is a claude-sonnet model from Anthropic for assistants, generation and automation tasks

Claude Sonnet 3.5 v2 is a large language model from Anthropic, with an approximate context window of 200,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

200K contextMultimodalInput $3/1M tokensOutput $15/1M tokensReleased October 2024
Anthropicclaude-haiku

Claude Haiku 3

Claude Haiku 3 is a claude-haiku model from Anthropic for assistants, generation and automation tasks

Claude Haiku 3 is a large language model from Anthropic, with an approximate context window of 200,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

200K contextMultimodalInput $0.25/1M tokensOutput $1.25/1M tokensReleased March 2024
Anthropicclaude-sonnet

Claude Sonnet 3

Claude Sonnet 3 is a claude-sonnet model from Anthropic for assistants, generation and automation tasks

Claude Sonnet 3 is a large language model from Anthropic, with an approximate context window of 200,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

200K contextMultimodalInput $3/1M tokensOutput $15/1M tokensReleased March 2024
Anthropicclaude-opus

Claude Opus 3

Claude Opus 3 is a claude-opus model from Anthropic for assistants, generation and automation tasks

Claude Opus 3 is a large language model from Anthropic, with an approximate context window of 200,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

200K contextMultimodalInput $15/1M tokensOutput $75/1M tokensReleased February 2024

domainStepFun

StepFun

Step 3.5 Flash 2603

Step 3.5 Flash 2603 is a AI model from stepfun for assistants, generation and automation tasks

Step 3.5 Flash 2603 is a large language model from stepfun, with an approximate context window of 256,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

256K contextInput $0.1/1M tokensOutput $0.3/1M tokensReleased April 2026
StepFun

Step 3.5 Flash

Step 3.5 Flash is a AI model from stepfun for assistants, generation and automation tasks

Step 3.5 Flash is a large language model from stepfun. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

Input ¥0.14/1M tokensReleased January 2026
StepFunstep-1

Step 1 32K

Step 1 32K for document understanding, summarization and knowledge Q&A

Step 1 32K is a StepFun model for document understanding, summarization and knowledge Q&A, commonly evaluated for Chinese assistants, document and multimodal workflows.

Input ¥15/1M tokensOutput ¥70/1M tokens
StepFunstep-1

Step 1 128K

Step 1 128K for very long-document analysis and retrieval-augmented workflows

Step 1 128K is a StepFun model for very long-document analysis and retrieval-augmented workflows, commonly evaluated for Chinese assistants, document and multimodal workflows.

StepFunstep-2

Step 2 Mini

Step 2 Mini for low-latency and cost-sensitive high-volume workloads

Step 2 Mini is a StepFun model for low-latency and cost-sensitive high-volume workloads, commonly evaluated for Chinese assistants, document and multimodal workflows.

Input ¥1/1M tokensOutput ¥2/1M tokens
StepFunstep-vision

Step 1V 8K

Step 1V 8K for visual Q&A, multimodal analysis and image-text understanding

Step 1V 8K is a StepFun model for visual Q&A, multimodal analysis and image-text understanding, commonly evaluated for Chinese assistants, document and multimodal workflows.

MultimodalInput ¥5/1M tokensOutput ¥20/1M tokens

domainMistral

Mistralmistral-small

Mistral Small 4

Mistral Small 4 is a mistral-small model from Mistral for assistants, generation and automation tasks

Mistral Small 4 is a large language model from Mistral, with an approximate context window of 256,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

256K contextMultimodalInput $0.15/1M tokensOutput $0.6/1M tokensReleased March 2026
Mistraldevstral

Devstral 2

Devstral 2 is a devstral model from Mistral for assistants, generation and automation tasks

Devstral 2 is a large language model from Mistral, with an approximate context window of 262,144 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

262K contextInput $0.4/1M tokensOutput $2/1M tokensReleased December 2025
Mistraldevstral

Devstral Small 2

Devstral Small 2 is a devstral model from Mistral for assistants, generation and automation tasks

Devstral Small 2 is a large language model from Mistral, with an approximate context window of 256,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

256K contextMultimodalInput $0/1M tokensOutput $0/1M tokensReleased December 2025
Mistraldevstral

Devstral 2 (latest)

Devstral 2 (latest) is a devstral model from Mistral for assistants, generation and automation tasks

Devstral 2 (latest) is a large language model from Mistral, with an approximate context window of 262,144 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

262K contextInput $0.4/1M tokensOutput $2/1M tokensReleased December 2025
Mistralmistral-medium

Mistral Medium 3.1

Mistral Medium 3.1 is a mistral-medium model from Mistral for assistants, generation and automation tasks

Mistral Medium 3.1 is a large language model from Mistral, with an approximate context window of 262,144 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

262K contextMultimodalInput $0.4/1M tokensOutput $2/1M tokensReleased August 2025
Mistraldevstral

Devstral Medium

Devstral Medium is a devstral model from Mistral for assistants, generation and automation tasks

Devstral Medium is a large language model from Mistral, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextInput $0.4/1M tokensOutput $2/1M tokensReleased July 2025
Mistraldevstral

Devstral Small

Devstral Small is a devstral model from Mistral for assistants, generation and automation tasks

Devstral Small is a large language model from Mistral, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextInput $0.1/1M tokensOutput $0.3/1M tokensReleased July 2025
Mistralmistral-small

Mistral Small 3.2

Mistral Small 3.2 is a mistral-small model from Mistral for assistants, generation and automation tasks

Mistral Small 3.2 is a large language model from Mistral, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextMultimodalInput $0.1/1M tokensOutput $0.3/1M tokensReleased June 2025
Mistraldevstral

Devstral Small 2505

Devstral Small 2505 is a devstral model from Mistral for assistants, generation and automation tasks

Devstral Small 2505 is a large language model from Mistral, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextInput $0.1/1M tokensOutput $0.3/1M tokensReleased May 2025
Mistralmistral-medium

Mistral Medium 3

Mistral Medium 3 is a mistral-medium model from Mistral for assistants, generation and automation tasks

Mistral Medium 3 is a large language model from Mistral, with an approximate context window of 131,072 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

131K contextMultimodalInput $0.4/1M tokensOutput $2/1M tokensReleased May 2025
Mistralmagistral-medium

Magistral Medium (latest)

Magistral Medium (latest) is a magistral-medium model from Mistral for assistants, generation and automation tasks

Magistral Medium (latest) is a large language model from Mistral, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextInput $2/1M tokensOutput $5/1M tokensReleased March 2025
Mistralmagistral-small

Magistral Small

Magistral Small is a magistral-small model from Mistral for assistants, generation and automation tasks

Magistral Small is a large language model from Mistral, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextInput $0.5/1M tokensOutput $1.5/1M tokensReleased March 2025
Mistralmistral-large

Mistral Large 3

Mistral Large 3 is a mistral-large model from Mistral for assistants, generation and automation tasks

Mistral Large 3 is a large language model from Mistral, with an approximate context window of 262,144 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

262K contextMultimodalInput $0.5/1M tokensOutput $1.5/1M tokensReleased November 2024
Mistralmistral-large

Mistral Large 2.1

Mistral Large 2.1 is a mistral-large model from Mistral for assistants, generation and automation tasks

Mistral Large 2.1 is a large language model from Mistral, with an approximate context window of 131,072 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

131K contextInput $2/1M tokensOutput $6/1M tokensReleased November 2024
Mistralministral

Ministral 3B (latest)

Ministral 3B (latest) is a ministral model from Mistral for assistants, generation and automation tasks

Ministral 3B (latest) is a large language model from Mistral, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextInput $0.04/1M tokensOutput $0.04/1M tokensReleased October 2024
Mistralpixtral

Pixtral 12B

Pixtral 12B is a pixtral model from Mistral for assistants, generation and automation tasks

Pixtral 12B is a large language model from Mistral, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextMultimodalInput $0.15/1M tokensOutput $0.15/1M tokensReleased September 2024
Mistralmistral-nemo

Mistral Nemo

Mistral Nemo is a mistral-nemo model from Mistral for assistants, generation and automation tasks

Mistral Nemo is a large language model from Mistral, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextInput $0.15/1M tokensOutput $0.15/1M tokensReleased July 2024
Mistralmixtral

Mixtral 8x22B

Mixtral 8x22B is a mixtral model from Mistral for assistants, generation and automation tasks

Mixtral 8x22B is a large language model from Mistral, with an approximate context window of 64,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

64K contextInput $2/1M tokensOutput $6/1M tokensReleased April 2024
Mistralmixtral

Mixtral 8x7B

Mixtral 8x7B is a mixtral model from Mistral for assistants, generation and automation tasks

Mixtral 8x7B is a large language model from Mistral, with an approximate context window of 32,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

32K contextInput $0.7/1M tokensOutput $0.7/1M tokensReleased December 2023
Mistralmistral

Mistral 7B

Mistral 7B is a mistral model from Mistral for assistants, generation and automation tasks

Mistral 7B is a large language model from Mistral, with an approximate context window of 8,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

8K contextInput $0.25/1M tokensOutput $0.25/1M tokensReleased September 2023
Mistralcodestral

Codestral

Codestral for code completion, code generation and developer assistance

Codestral is a Mistral model for code completion, code generation and developer assistance, with options across enterprise, coding, vision and open-weight workflows.

256K contextInput $0.3/1M tokensOutput $0.9/1M tokens
Mistralmistral-small

Mistral Small

Mistral Small for low-latency high-concurrency and cost-sensitive workloads

Mistral Small is a Mistral model for low-latency high-concurrency and cost-sensitive workloads, with options across enterprise, coding, vision and open-weight workflows.

256K contextMultimodalInput $0.15/1M tokensOutput $0.6/1M tokens
Mistralmistral-large

Mistral Large

Mistral Large for complex reasoning, enterprise Q&A and multilingual tasks

Mistral Large is a Mistral model for complex reasoning, enterprise Q&A and multilingual tasks, with options across enterprise, coding, vision and open-weight workflows.

262K contextMultimodalInput $0.5/1M tokensOutput $1.5/1M tokens
Mistralministral

Ministral 8B

Ministral 8B for edge deployment, low-cost usage and basic text tasks

Ministral 8B is a Mistral model for edge deployment, low-cost usage and basic text tasks, with options across enterprise, coding, vision and open-weight workflows.

128K contextInput $0.1/1M tokensOutput $0.1/1M tokens
Mistralpixtral

Pixtral Large

Pixtral Large for image-text understanding, visual Q&A and multimodal analysis

Pixtral Large is a Mistral model for image-text understanding, visual Q&A and multimodal analysis, with options across enterprise, coding, vision and open-weight workflows.

128K contextMultimodalInput $2/1M tokensOutput $6/1M tokens
Mistralmixtral

Mixtral 8x7B

Mixtral 8x7B for general language tasks, research and self-hosted deployments

Mixtral 8x7B is a Mistral model for general language tasks, research and self-hosted deployments, with options across enterprise, coding, vision and open-weight workflows.

domainZhipu AI

Zhipu AIglm-flash

GLM-4.7-FlashX

GLM-4.7-FlashX is a glm-flash model from Zhipu AI for assistants, generation and automation tasks

GLM-4.7-FlashX is a large language model from Zhipu AI, with an approximate context window of 200,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

200K contextInput $0.07/1M tokensOutput $0.4/1M tokensReleased January 2026
Zhipu AIglm-flash

GLM-4.7-Flash

GLM-4.7-Flash is a glm-flash model from Zhipu AI for assistants, generation and automation tasks

GLM-4.7-Flash is a large language model from Zhipu AI, with an approximate context window of 200,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

200K contextInput $0/1M tokensOutput $0/1M tokensReleased January 2026
Zhipu AIglm

GLM-4.7

GLM-4.7 is a glm model from Zhipu AI for assistants, generation and automation tasks

GLM-4.7 is a large language model from Zhipu AI, with an approximate context window of 204,800 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

205K contextInput $0.6/1M tokensOutput $2.2/1M tokensReleased December 2025
Zhipu AIglm

GLM-4.6V

GLM-4.6V is a glm model from Zhipu AI for assistants, generation and automation tasks

GLM-4.6V is a large language model from Zhipu AI, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextMultimodalInput $0.3/1M tokensOutput $0.9/1M tokensReleased December 2025
Zhipu AIglm

GLM-4.6

GLM-4.6 is a glm model from Zhipu AI for assistants, generation and automation tasks

GLM-4.6 is a large language model from Zhipu AI, with an approximate context window of 204,800 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

205K contextInput $0.6/1M tokensOutput $2.2/1M tokensReleased September 2025
Zhipu AIglm

GLM-4.5V

GLM-4.5V is a glm model from Zhipu AI for assistants, generation and automation tasks

GLM-4.5V is a large language model from Zhipu AI, with an approximate context window of 64,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

64K contextMultimodalInput $0.6/1M tokensOutput $1.8/1M tokensReleased August 2025
Zhipu AIglm-air

GLM-4.5-Air

GLM-4.5-Air is a glm-air model from Zhipu AI for assistants, generation and automation tasks

GLM-4.5-Air is a large language model from Zhipu AI, with an approximate context window of 131,072 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

131K contextInput $0.2/1M tokensOutput $1.1/1M tokensReleased July 2025
Zhipu AIglm

GLM-4.5

GLM-4.5 is a glm model from Zhipu AI for assistants, generation and automation tasks

GLM-4.5 is a large language model from Zhipu AI, with an approximate context window of 131,072 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

131K contextInput $0.6/1M tokensOutput $2.2/1M tokensReleased July 2025
Zhipu AIglm-flash

GLM-4.5-Flash

GLM-4.5-Flash is a glm-flash model from Zhipu AI for assistants, generation and automation tasks

GLM-4.5-Flash is a large language model from Zhipu AI, with an approximate context window of 131,072 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

131K contextInput $0/1M tokensOutput $0/1M tokensReleased July 2025
Zhipu AIglm

GLM-4-Air

GLM-4-Air for low-cost high-concurrency Chinese assistant workloads

GLM-4-Air is a Zhipu AI GLM model for low-cost high-concurrency Chinese assistant workloads, commonly evaluated for Chinese enterprise and agent workflows.

Zhipu AIglm

GLM-4-Flash

GLM-4-Flash for fast responses, lightweight automation and high-frequency conversations

GLM-4-Flash is a Zhipu AI GLM model for fast responses, lightweight automation and high-frequency conversations, commonly evaluated for Chinese enterprise and agent workflows.

Zhipu AIglm-4.5

GLM-4.5

GLM-4.5 for agent workflows, coding and complex reasoning tasks

GLM-4.5 is a Zhipu AI GLM model for agent workflows, coding and complex reasoning tasks, commonly evaluated for Chinese enterprise and agent workflows.

Zhipu AIglm-z1

GLM-Z1

GLM-Z1 for multi-step reasoning, math, coding and complex problem solving

GLM-Z1 is a Zhipu AI GLM model for multi-step reasoning, math, coding and complex problem solving, commonly evaluated for Chinese enterprise and agent workflows.

domainCohere

Coherecommand-a

Command A Translate

Command A Translate is a command-a model from Cohere for assistants, generation and automation tasks

Command A Translate is a large language model from Cohere, with an approximate context window of 8,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

8K contextInput $2.5/1M tokensOutput $10/1M tokensReleased August 2025
Coherecommand-a

Command A Reasoning

Command A Reasoning is a command-a model from Cohere for assistants, generation and automation tasks

Command A Reasoning is a large language model from Cohere, with an approximate context window of 256,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

256K contextInput $2.5/1M tokensOutput $10/1M tokensReleased August 2025
Coherecommand-a

Command A Vision

Command A Vision is a command-a model from Cohere for assistants, generation and automation tasks

Command A Vision is a large language model from Cohere, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextMultimodalInput $2.5/1M tokensOutput $10/1M tokensReleased July 2025
Coherecommand-a

Command A

Command A is a command-a model from Cohere for assistants, generation and automation tasks

Command A is a large language model from Cohere, with an approximate context window of 256,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

256K contextInput $2.5/1M tokensOutput $10/1M tokensReleased March 2025
Coherecommand-r

Command R7B Arabic

Command R7B Arabic is a command-r model from Cohere for assistants, generation and automation tasks

Command R7B Arabic is a large language model from Cohere, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextInput $0.0375/1M tokensOutput $0.15/1M tokensReleased February 2025
Coherecommand-r

Command R

Command R is a command-r model from Cohere for assistants, generation and automation tasks

Command R is a large language model from Cohere, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextInput $0.15/1M tokensOutput $0.6/1M tokensReleased August 2024
Coherecommand-r

Command R+

Command R+ is a command-r model from Cohere for assistants, generation and automation tasks

Command R+ is a large language model from Cohere, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextInput $2.5/1M tokensOutput $10/1M tokensReleased August 2024
Coherecommand-r

Command R7B

Command R7B is a command-r model from Cohere for assistants, generation and automation tasks

Command R7B is a large language model from Cohere, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextInput $0.0375/1M tokensOutput $0.15/1M tokensReleased February 2024
Coherecommand

Command A

Command A for enterprise agents and complex generation tasks

Command A is a Cohere generation model for enterprise Q&A, RAG, agents and multilingual generation.

Coherecommand-r

Command R+

Command R+ for advanced retrieval-augmented generation, tool use and enterprise Q&A

Command R+ is a Cohere generation model for enterprise Q&A, RAG, agents and multilingual generation.

Coherecommand-r

Command R

Command R for RAG, long-context and multilingual Q&A

Command R is a Cohere generation model for enterprise Q&A, RAG, agents and multilingual generation.

Coherecommand

Command Light

Command Light for low-latency text generation and basic chat

Command Light is a Cohere generation model for enterprise Q&A, RAG, agents and multilingual generation.

Cohereembed

Embed v4.0

Embed v4.0 for semantic search, clustering and RAG knowledge-base indexing

Embed v4.0 is a Cohere embedding model for semantic search, RAG indexing, clustering and similarity workflows.

Coherererank

Rerank v3.5

Rerank v3.5 for reranking search results and improving RAG answer quality

Rerank v3.5 is a Cohere reranking model for search reranking, RAG refinement and answer-quality improvements.

domainMeta Llama

Meta Llamallama

Llama-4-Maverick-17B-128E-Instruct-FP8

Llama-4-Maverick-17B-128E-Instruct-FP8 is a llama model from llama for assistants, generation and automation tasks

Llama-4-Maverick-17B-128E-Instruct-FP8 is a large language model from llama, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextMultimodalInput $0/1M tokensOutput $0/1M tokensReleased April 2025
Meta Llamallama

Cerebras-Llama-4-Scout-17B-16E-Instruct

Cerebras-Llama-4-Scout-17B-16E-Instruct is a llama model from llama for assistants, generation and automation tasks

Cerebras-Llama-4-Scout-17B-16E-Instruct is a large language model from llama, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextInput $0/1M tokensOutput $0/1M tokensReleased April 2025
Meta Llamallama

Llama-4-Scout-17B-16E-Instruct-FP8

Llama-4-Scout-17B-16E-Instruct-FP8 is a llama model from llama for assistants, generation and automation tasks

Llama-4-Scout-17B-16E-Instruct-FP8 is a large language model from llama, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextMultimodalInput $0/1M tokensOutput $0/1M tokensReleased April 2025
Meta Llamallama

Cerebras-Llama-4-Maverick-17B-128E-Instruct

Cerebras-Llama-4-Maverick-17B-128E-Instruct is a llama model from llama for assistants, generation and automation tasks

Cerebras-Llama-4-Maverick-17B-128E-Instruct is a large language model from llama, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextInput $0/1M tokensOutput $0/1M tokensReleased April 2025
Meta Llamallama

Groq-Llama-4-Maverick-17B-128E-Instruct

Groq-Llama-4-Maverick-17B-128E-Instruct is a llama model from llama for assistants, generation and automation tasks

Groq-Llama-4-Maverick-17B-128E-Instruct is a large language model from llama, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextInput $0/1M tokensOutput $0/1M tokensReleased April 2025
Meta Llamallama

Llama-3.3-8B-Instruct

Llama-3.3-8B-Instruct is a llama model from llama for assistants, generation and automation tasks

Llama-3.3-8B-Instruct is a large language model from llama, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextInput $0/1M tokensOutput $0/1M tokensReleased December 2024
Meta Llamallama

Llama-3.3-70B-Instruct

Llama-3.3-70B-Instruct is a llama model from llama for assistants, generation and automation tasks

Llama-3.3-70B-Instruct is a large language model from llama, with an approximate context window of 128,000 tokens. It can be evaluated for assistants, knowledge Q&A, content generation, structured extraction and business automation. Pricing and availability may vary by upstream provider or relay service.

128K contextInput $0/1M tokensOutput $0/1M tokensReleased December 2024
Meta Llamallama-3.1

Llama 3.1 8B

Llama 3.1 8B for local deployment, basic chat and low-cost inference

Llama 3.1 8B is a Meta Llama model for local deployment, basic chat and low-cost inference, commonly evaluated for open ecosystems, self-hosting and customizable AI products.

Meta Llamallama-3.1

Llama 3.1 70B

Llama 3.1 70B for general language understanding, generation and enterprise self-hosting

Llama 3.1 70B is a Meta Llama model for general language understanding, generation and enterprise self-hosting, commonly evaluated for open ecosystems, self-hosting and customizable AI products.

Meta Llamallama-3.1

Llama 3.1 405B

Llama 3.1 405B for complex reasoning, multilingual tasks and high-quality generation

Llama 3.1 405B is a Meta Llama model for complex reasoning, multilingual tasks and high-quality generation, commonly evaluated for open ecosystems, self-hosting and customizable AI products.

Meta Llamallama-3.2

Llama 3.2 Vision

Llama 3.2 Vision for image-text understanding, visual Q&A and multimodal applications

Llama 3.2 Vision is a Meta Llama model for image-text understanding, visual Q&A and multimodal applications, commonly evaluated for open ecosystems, self-hosting and customizable AI products.

Meta Llamallama-4

Llama 4 Scout

Llama 4 Scout for multimodal, long-context and efficient reasoning workflows

Llama 4 Scout is a Meta Llama model for multimodal, long-context and efficient reasoning workflows, commonly evaluated for open ecosystems, self-hosting and customizable AI products.

Meta Llamallama-4

Llama 4 Maverick

Llama 4 Maverick for complex multimodal tasks, agents and high-quality generation

Llama 4 Maverick is a Meta Llama model for complex multimodal tasks, agents and high-quality generation, commonly evaluated for open ecosystems, self-hosting and customizable AI products.

domainXiaomi (MiMo)

Xiaomi (MiMo)mimo

MiMo V2.5

MiMo V2.5 for Chinese conversation, content generation and tool-use workflows

MiMo V2.5 is a Xiaomi MiMo model for Chinese conversation, content generation and tool-use workflows, useful for Chinese assistants, tool use and ecosystem-oriented applications.

1000K contextMultimodalInput ¥2.8/1M tokensOutput ¥14/1M tokens
Xiaomi (MiMo)mimo

MiMo V2.5 Pro

MiMo V2.5 Pro for complex reasoning, coding and long-text tasks

MiMo V2.5 Pro is a Xiaomi MiMo model for complex reasoning, coding and long-text tasks, useful for Chinese assistants, tool use and ecosystem-oriented applications.

1000K contextInput ¥7/1M tokensOutput ¥21/1M tokens
Xiaomi (MiMo)mimo

MiMo V2 Flash

MiMo V2 Flash for low-latency high-frequency chat and quick responses

MiMo V2 Flash is a Xiaomi MiMo model for low-latency high-frequency chat and quick responses, useful for Chinese assistants, tool use and ecosystem-oriented applications.

256K contextInput ¥0.7/1M tokensOutput ¥2.1/1M tokens
Xiaomi (MiMo)mimo

MiMo V2 Omni

MiMo V2 Omni for multimodal understanding and integrated interactive experiences

MiMo V2 Omni is a Xiaomi MiMo model for multimodal understanding and integrated interactive experiences, useful for Chinese assistants, tool use and ecosystem-oriented applications.

256K contextMultimodalInput ¥2.8/1M tokensOutput ¥14/1M tokens
Xiaomi (MiMo)mimo

MiMo V2 Pro

MiMo V2 Pro for complex tasks, coding assistance and business automation

MiMo V2 Pro is a Xiaomi MiMo model for complex tasks, coding assistance and business automation, useful for Chinese assistants, tool use and ecosystem-oriented applications.

1000K contextInput ¥7/1M tokensOutput ¥21/1M tokens

domainMiniMax

MiniMaxabab

ABAB6.5s Chat

ABAB6.5s Chat for Chinese chat, writing and business assistant workloads

ABAB6.5s Chat is a MiniMax model for Chinese chat, writing and business assistant workloads, often evaluated for Chinese assistants, generation and multimedia workflows.

MiniMaxabab

ABAB6.5 Chat

ABAB6.5 Chat for general conversation, long-text understanding and complex interactions

ABAB6.5 Chat is a MiniMax model for general conversation, long-text understanding and complex interactions, often evaluated for Chinese assistants, generation and multimedia workflows.

MiniMaxminimax-text

MiniMax Text 01

MiniMax Text 01 for general language understanding, generation and agent workflows

MiniMax Text 01 is a MiniMax model for general language understanding, generation and agent workflows, often evaluated for Chinese assistants, generation and multimedia workflows.

MiniMaxminimax-m

MiniMax M1

MiniMax M1 for long-context reasoning, coding and complex task planning

MiniMax M1 is a MiniMax model for long-context reasoning, coding and complex task planning, often evaluated for Chinese assistants, generation and multimedia workflows.

MiniMaxminimax-speech

MiniMax Speech 01

MiniMax speech model for voice generation, conversation and multimedia content

MiniMax Speech 01 focuses on voice generation and multimedia experiences rather than general text conversation.

domainVolcengine

Volcenginedoubao

Doubao Pro

Doubao Pro for higher-quality Chinese assistants, content generation and business automation

Doubao Pro is a Volcengine Doubao model for higher-quality Chinese assistants, content generation and business automation, often evaluated for Chinese enterprise and multimodal workflows.

Volcenginedoubao

Doubao Lite

Doubao Lite for low-cost high-concurrency chat and lightweight text tasks

Doubao Lite is a Volcengine Doubao model for low-cost high-concurrency chat and lightweight text tasks, often evaluated for Chinese enterprise and multimodal workflows.

Volcenginedoubao-seed

Doubao Seed 1.6

Doubao Seed 1.6 for general chat, reasoning and agent workflow evaluation

Doubao Seed 1.6 is a Volcengine Doubao model for general chat, reasoning and agent workflow evaluation, often evaluated for Chinese enterprise and multimodal workflows.

Input ¥0.8/1M tokensOutput ¥2/1M tokens
Volcenginedoubao-thinking

Doubao Seed 1.6 Thinking

Doubao Seed 1.6 Thinking for complex reasoning, multi-step analysis and coding assistance

Doubao Seed 1.6 Thinking is a Volcengine Doubao model for complex reasoning, multi-step analysis and coding assistance, often evaluated for Chinese enterprise and multimodal workflows.

Volcenginedoubao-vision

Doubao Vision Pro

Doubao Vision Pro for image-text analysis, multimodal Q&A and visual content understanding

Doubao Vision Pro is a Volcengine Doubao model for image-text analysis, multimodal Q&A and visual content understanding, often evaluated for Chinese enterprise and multimodal workflows.

domainBaidu (ERNIE)

Baidu (ERNIE)ernie

ERNIE 4.0 Turbo 8K

ERNIE 4.0 Turbo 8K for Chinese understanding, content generation and enterprise applications

ERNIE 4.0 Turbo 8K is a Baidu ERNIE/Qianfan model for Chinese understanding, content generation and enterprise applications, commonly evaluated for Chinese enterprise workloads.

Baidu (ERNIE)ernie-4.5

ERNIE 4.5 Turbo

ERNIE 4.5 Turbo for multi-scenario Chinese tasks, knowledge Q&A and business assistants

ERNIE 4.5 Turbo is a Baidu ERNIE/Qianfan model for multi-scenario Chinese tasks, knowledge Q&A and business assistants, commonly evaluated for Chinese enterprise workloads.

Baidu (ERNIE)ernie-x1

ERNIE X1

ERNIE X1 for complex analysis, logical reasoning and multi-step problem solving

ERNIE X1 is a Baidu ERNIE/Qianfan model for complex analysis, logical reasoning and multi-step problem solving, commonly evaluated for Chinese enterprise workloads.

Baidu (ERNIE)ernie-speed

ERNIE Speed

ERNIE Speed for low-latency conversations and high-frequency basic text tasks

ERNIE Speed is a Baidu ERNIE/Qianfan model for low-latency conversations and high-frequency basic text tasks, commonly evaluated for Chinese enterprise workloads.

Baidu (ERNIE)ernie-lite

ERNIE Lite

ERNIE Lite for cost-sensitive Q&A, summarization and content generation

ERNIE Lite is a Baidu ERNIE/Qianfan model for cost-sensitive Q&A, summarization and content generation, commonly evaluated for Chinese enterprise workloads.