Why are input and output prices different?
Model providers often charge generated output tokens at a higher rate because generation uses more inference compute.
Quickly estimate your API usage cost.
Compare official pricing, relay multipliers and token volume in one place.
Paste text to estimate how many tokens it may consume. The text is processed locally.
This is a practical estimate. Exact token counts may vary by model tokenizer.
| Model / MODEL | Provider / PROVIDER | Context | Input / INPUT ($) | Output / OUTPUT ($) |
|---|---|---|---|---|
| GPT-4o | OpenAI | 128K | 5.00 | 15.00 |
| GPT-4 Turbo | OpenAI | 128K | 10.00 | 30.00 |
| Claude 3.5 Sonnet | Anthropic | 200K | 3.00 | 15.00 |
| Claude 3 Opus | Anthropic | 200K | 15.00 | 75.00 |
| Gemini 1.5 Pro | 2M | 3.50 | 10.50 | |
| DeepSeek Chat | DeepSeek | 64K | 0.27 | 1.10 |
| Qwen Plus | Alibaba Cloud | 128K | 0.40 | 1.20 |
* Prices are for reference only. Actual billing follows the official API provider.
Most model APIs charge input and output tokens separately. Relay services may add a multiplier or custom unit price.
Prompt, system instructions, conversation history and tool context are usually billed as input tokens.
Model responses are usually billed as output tokens, often at a higher unit price than input tokens.
Relay sites may apply a markup or package conversion, so compare the final cost with official pricing.
Model providers often charge generated output tokens at a higher rate because generation uses more inference compute.
The text tokenizer here is a practical estimate. Exact token counts can vary by model and provider tokenizer.
Compare the final cost after exchange rate, relay multiplier and actual supported model names, not just the advertised unit price.