LLM Speed Benchmark 2026
Compare first-token latency and throughput across 15+ models. Filter by provider, category, and price. Updated 2026-06-13.
15 models
| Model | Provider | Category | First token | Throughput | Context | Price / 1M |
|---|---|---|---|---|---|---|
Mistral Small text | Mistral | fast | 150 ms | 190 /s | 32k | $0.80 |
Gemini 2.0 Flash textvisionaudio | fast | 160 ms | 180 /s | 1000k | $0.50 | |
Llama 4 Scout textvision | Meta | fast | 170 ms | 175 /s | 256k | $0.80 |
GPT-4o mini textvision | OpenAI | fast | 180 ms | 165 /s | 128k | $0.75 |
Claude 3.5 Haiku textvision | Anthropic | fast | 210 ms | 145 /s | 200k | $4.80 |
DeepSeek V3 text | DeepSeek | balanced | 260 ms | 120 /s | 64k | $1.30 |
Llama 3.3 70B text | Meta | balanced | 290 ms | 115 /s | 128k | $1.60 |
Amazon Nova Pro textvision | AWS | balanced | 310 ms | 110 /s | 300k | $4.00 |
GPT-4o textvision | OpenAI | frontier | 320 ms | 108 /s | 128k | $12.50 |
Mistral Large text | Mistral | frontier | 340 ms | 100 /s | 128k | $8.00 |
Grok 3 textvision | xAI | frontier | 360 ms | 105 /s | 128k | $18.00 |
Gemini 2.5 Pro textvisionaudio | frontier | 380 ms | 95 /s | 1000k | $11.25 | |
Claude 3.7 Sonnet textvision | Anthropic | frontier | 450 ms | 85 /s | 200k | $18.00 |
o3-mini text | OpenAI | reasoning | 850 ms | 70 /s | 200k | $5.50 |
DeepSeek R1 text | DeepSeek | reasoning | 1200 ms | 55 /s | 64k | $2.74 |
How we measure speed
First-token latency (FTL)
The time from sending a prompt to receiving the first token. Lower is better for interactive applications like chat and coding assistants.
Throughput (tokens/sec)
The rate at which the model generates tokens after the first one. Higher is better for long-form content, summarization, and batch jobs.
Numbers are representative benchmarks collected from public provider documentation and independent tests. Actual performance varies by region, load, and prompt length. Last updated: 2026-06-12.