Context Window
The context window is the maximum number of tokens a model can consider in a single forward pass. It includes the system prompt, user messages, retrieved documents, and the model's own generated output. If the total exceeds the window, the oldest tokens are dropped or the request fails. Context windows vary widely. Small models may handle 4,000 tokens, while frontier models can process 128,000, 1,000,000, or even 10,000,000 tokens. Long context is useful for summarizing books, analyzing large codebases, and holding extended conversations without losing earlier details. A larger window does not always mean better results. Very long inputs can dilute attention, making the model miss important details. Techniques like RAG, selective summarization, and hierarchical chunking help fit the most relevant information into the window without exceeding the limit.
Published 2026-06-12
Related terms
Explore the glossary
Find definitions for AI, LLM, MCP, RAG, agent, and prompt engineering terms.
Browse all termsRelated Resources
Large Language Model
GlossaryA neural network trained on vast text data to understand and generate human language.
DeepSeek Coder Architect
PromptLeverage DeepSeek Coder for complex software architecture, code generation, and technical problem-solving with advanced reasoning.
3D Printing Optimizer
SkillOptimize 3D models for additive manufacturing considering orientation, supports, infill, and material properties.
Firecrawl
MCP ServerOfficial Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM clients.
Transformer
GlossaryA neural network architecture using self-attention to process sequences in parallel, forming the basis of modern LLMs.