Skip to main content
VePrompts
Models & Architecture

Context Window

The context window is the maximum number of tokens a model can consider in a single forward pass. It includes the system prompt, user messages, retrieved documents, and the model's own generated output. If the total exceeds the window, the oldest tokens are dropped or the request fails. Context windows vary widely. Small models may handle 4,000 tokens, while frontier models can process 128,000, 1,000,000, or even 10,000,000 tokens. Long context is useful for summarizing books, analyzing large codebases, and holding extended conversations without losing earlier details. A larger window does not always mean better results. Very long inputs can dilute attention, making the model miss important details. Techniques like RAG, selective summarization, and hierarchical chunking help fit the most relevant information into the window without exceeding the limit.

Published 2026-06-12

Explore the glossary

Find definitions for AI, LLM, MCP, RAG, agent, and prompt engineering terms.

Browse all terms

Related Resources