Skip to main content
VePrompts

LLM Speed Benchmark 2026

Compare first-token latency and throughput across 15+ models. Filter by provider, category, and price. Updated 2026-06-13.

15 models

ModelProviderCategoryFirst tokenThroughputContextPrice / 1M
Mistral Small
text
Mistralfast
150 ms
190 /s
32k$0.80
Gemini 2.0 Flash
textvisionaudio
Googlefast
160 ms
180 /s
1000k$0.50
Llama 4 Scout
textvision
Metafast
170 ms
175 /s
256k$0.80
GPT-4o mini
textvision
OpenAIfast
180 ms
165 /s
128k$0.75
Claude 3.5 Haiku
textvision
Anthropicfast
210 ms
145 /s
200k$4.80
DeepSeek V3
text
DeepSeekbalanced
260 ms
120 /s
64k$1.30
Llama 3.3 70B
text
Metabalanced
290 ms
115 /s
128k$1.60
Amazon Nova Pro
textvision
AWSbalanced
310 ms
110 /s
300k$4.00
GPT-4o
textvision
OpenAIfrontier
320 ms
108 /s
128k$12.50
Mistral Large
text
Mistralfrontier
340 ms
100 /s
128k$8.00
Grok 3
textvision
xAIfrontier
360 ms
105 /s
128k$18.00
Gemini 2.5 Pro
textvisionaudio
Googlefrontier
380 ms
95 /s
1000k$11.25
Claude 3.7 Sonnet
textvision
Anthropicfrontier
450 ms
85 /s
200k$18.00
o3-mini
text
OpenAIreasoning
850 ms
70 /s
200k$5.50
DeepSeek R1
text
DeepSeekreasoning
1200 ms
55 /s
64k$2.74

How we measure speed

First-token latency (FTL)

The time from sending a prompt to receiving the first token. Lower is better for interactive applications like chat and coding assistants.

Throughput (tokens/sec)

The rate at which the model generates tokens after the first one. Higher is better for long-form content, summarization, and batch jobs.

Numbers are representative benchmarks collected from public provider documentation and independent tests. Actual performance varies by region, load, and prompt length. Last updated: 2026-06-12.