How do I calculate embedding costs?

Multiply the number of documents by the average words per document, then convert words to tokens (we use 1.33 tokens per word as a rule of thumb). Multiply total tokens by the model price per 1 million tokens. If batch pricing is available and selected, apply the discount.

Which embedding model is cheapest?

For most use cases, OpenAI text-embedding-3-small and Jina embeddings v3 are among the most affordable at roughly $0.02 per 1M tokens. Costs scale linearly with the total number of tokens you embed.

What is the RAG cost estimate?

The RAG estimate combines the one-time cost of embedding your documents with the recurring cost of sending user queries to a retrieval-augmented generation pipeline, including both the query embedding and the generation LLM call.

Are these prices up to date?

Pricing is reviewed monthly. Provider rates can change, so always verify the latest figures in your provider dashboard before committing to a budget.

Free Embedding Cost Calculator — RAG & Vector DB Estimate

Model comparison

Model	Provider	Dimensions	Context	Effective price	Cost
text-embedding-3-small selected	OpenAI	1,536	8,191	$0.02/1M	$0.01
jina-embeddings-v3	Jina AI	1,024	8,192	$0.02/1M	$0.01
voyage-3-lite	Voyage AI	512	32,000	$0.03/1M	$0.02
text-embedding-ada-002	OpenAI	1,536	8,191	$0.10/1M	$0.07
embed-english-v3	Cohere	1,024	512	$0.10/1M	$0.07
embed-multilingual-v3	Cohere	1,024	512	$0.10/1M	$0.07
text-embedding-004	Google	768	2,048	$0.10/1M	$0.07
text-multilingual-embedding-002	Google	768	2,048	$0.10/1M	$0.07
voyage-3	Voyage AI	1,024	32,000	$0.10/1M	$0.07
mistral-embed	Mistral	1,024	8,092	$0.10/1M	$0.07
Titan Embeddings V2	Anthropic (AWS Bedrock)	1,024	8,192	$0.10/1M	$0.07
text-embedding-3-large	OpenAI	3,072	8,191	$0.13/1M	$0.09

How embedding pricing works

Pay per token, not per document

Embedding APIs charge by the number of tokens you send, not the number of documents. A 500-word document is roughly 665 tokens. If you embed 10,000 such documents, you send about 6.65 million tokens to the API.

Batch API discounts

OpenAI and several other providers offer a batch API that processes jobs asynchronously for a significant discount — often 50%. Use this calculator to see how much you can save if your embedding workload is not real-time.

RAG adds query costs

Retrieval-augmented generation has two cost components: the one-time embedding of your knowledge base, and the recurring cost of embedding each user query plus the LLM generation call. This calculator lets you estimate both.

Compare before you commit

Prices and context windows vary. A cheaper model may truncate long documents, while a more expensive model may deliver better retrieval accuracy. Use the comparison table to balance cost, context length, and output dimensions for your use case.

Embedding Cost Calculator

Estimated embedding cost

Calculation