AI & LLM Glossary
Plain-English definitions for 248+ AI, LLM, MCP, RAG, and agent terms. Browse by category or search the encyclopedia.
A
A2A
Agents & ToolsAgent-to-Agent protocol for agents to discover, negotiate, and delegate tasks.
Abstractive Summarization
Coding & ApplicationsGenerating a summary using new phrasing rather than extracting sentences.
Accuracy
Evaluation & SafetyThe proportion of correct predictions out of total predictions.
Activation Function
FundamentalsA mathematical function applied to a neuron's output, introducing non-linearity so the network can learn complex patterns.
Adapter
Training & Fine-tuningA small neural module inserted into a pre-trained model and trained for a specific task.
Adversarial Attack
Evaluation & SafetyAn input designed to trick a model into making a mistake.
Agent
Agents & ToolsAn AI agent is a system that uses a language model to perceive its environment, make decisions, and take actions to reach a goal. Unlike a simple chatbot that only responds to prompts, an agent can loop: observe state, plan next steps, call tools, review results, and adapt until the task is done. Agents are built from several components. A planner breaks a goal into subtasks. A memory module stores conversation history and working context. A tool interface lets the agent call APIs, run code, query databases, or interact with other systems. A feedback loop checks whether each step moved the agent closer to the goal. Simple agents might answer a question by searching the web. Complex agents can write and test code, file pull requests, or coordinate with other agents. The more autonomy an agent has, the more important safety guardrails become, such as human approval for destructive actions and clear logging for every decision.
Agent Card
Agents & ToolsA manifest describing an A2A agent's capabilities, endpoint, and authentication requirements.
AI Coding Assistant
Coding & ApplicationsA tool that helps write, review, or debug code using AI.
Alignment
Training & Fine-tuningThe process of ensuring a model behaves in ways consistent with human values and intentions.
Annotation
Coding & ApplicationsThe process of adding labels or metadata to training data.
API
Agents & ToolsApplication Programming Interface, a set of rules for software components to communicate.
API Pricing
Pricing & PerformanceThe cost structure for using a model or service via API, usually per input and output tokens.
Artificial General Intelligence
FundamentalsHypothetical AI that can understand, learn, and perform any intellectual task a human can do across any domain.
Artificial Intelligence
FundamentalsThe broad field of creating machines that can perform tasks requiring human-like intelligence, such as reasoning, learning, and perception.
Assistant
Coding & ApplicationsAn AI application that helps users complete tasks through conversation.
Attention
Models & ArchitectureA mechanism that lets a model focus on relevant parts of input when producing each output token.
Autonomy
Agents & ToolsThe degree to which an agent can operate without human intervention.
Autoregressive
Models & ArchitectureGenerating output one token at a time, using previously generated tokens as context for the next.
B
Backpropagation
FundamentalsThe algorithm used to train neural networks by propagating error gradients backward and updating weights.
Batch
FundamentalsA subset of training data processed together in one forward and backward pass.
Batch Processing
Pricing & PerformanceSending groups of requests together, often at a lower price but with higher latency.
Batch Size
FundamentalsThe number of training examples used in a single update step.
Benchmark
Evaluation & SafetyA standardized dataset and task used to compare models.
Bias
FundamentalsA learnable value added to a neuron's input before activation, helping the model fit data better.
BLEU
Evaluation & SafetyA metric that evaluates the quality of generated text by comparing it to reference text.
BM25
RAG & KnowledgeA ranking function used in keyword search to estimate document relevance.
BPE
Prompt EngineeringByte Pair Encoding, a subword tokenization algorithm used by many language models.
C
Caching
Pricing & PerformanceStoring and reusing previous results to reduce latency and cost.
CDN
Pricing & PerformanceContent Delivery Network, a geographically distributed network that speeds up content delivery.
Chain-of-Thought
Prompt EngineeringPrompting a model to show its reasoning step by step before giving a final answer.
Chatbot
Coding & ApplicationsA conversational interface that uses AI to interact with users.
Chunking
RAG & KnowledgeSplitting documents into smaller pieces before embedding and storing them.
CI/CD
Coding & ApplicationsContinuous Integration and Continuous Deployment, automated pipelines for building and releasing software.
Class Imbalance
Training & Fine-tuningWhen some classes or outcomes are far more common than others in training data.
Classification
Coding & ApplicationsAssigning input data to predefined categories.
Claude Code
Coding & ApplicationsAn agentic terminal coding tool powered by Claude.
Clustering
Coding & ApplicationsGrouping data points into clusters based on similarity.
Code Execution
Agents & ToolsRunning code generated by a model, usually in a controlled environment.
Cold Start
Coding & ApplicationsDifficulty in making predictions for new users or items with little historical data.
Collaborative Filtering
Coding & ApplicationsMaking recommendations based on patterns across many users.
Completion
Prompt EngineeringThe text generated by a model in response to a prompt.
Constitutional AI
Training & Fine-tuningA training approach where models critique and revise their own outputs according to a set of principles.
Content Filter
Evaluation & SafetyA system that blocks or flags disallowed content.
Content-Based Filtering
Coding & ApplicationsMaking recommendations based on attributes of items a user has liked.
Context Window
Models & ArchitectureThe context window is the maximum number of tokens a model can consider in a single forward pass. It includes the system prompt, user messages, retrieved documents, and the model's own generated output. If the total exceeds the window, the oldest tokens are dropped or the request fails. Context windows vary widely. Small models may handle 4,000 tokens, while frontier models can process 128,000, 1,000,000, or even 10,000,000 tokens. Long context is useful for summarizing books, analyzing large codebases, and holding extended conversations without losing earlier details. A larger window does not always mean better results. Very long inputs can dilute attention, making the model miss important details. Techniques like RAG, selective summarization, and hierarchical chunking help fit the most relevant information into the window without exceeding the limit.
Continual Pre-training
Training & Fine-tuningFurther pre-training a model on additional domain-specific data before fine-tuning.
Conversational AI
Coding & ApplicationsAI systems designed for natural language dialogue.
Copilot
Coding & ApplicationsAn AI assistant embedded in a workflow to augment human work.
Cost Per Million Tokens
Pricing & PerformanceA common pricing unit for API-based language models.
Curriculum Learning
Training & Fine-tuningTraining a model on easier examples first and gradually increasing difficulty.
Cursor
Coding & ApplicationsAn AI-native code editor built on VS Code with strong agentic features.
D
Data Augmentation
Training & Fine-tuningCreating additional training examples from existing data to improve model robustness.
Data Leakage
Coding & ApplicationsWhen information from outside the training set inappropriately influences model training.
Debugging
Coding & ApplicationsThe process of finding and fixing errors in software.
Decoder
Models & ArchitectureA model component that generates output sequences from an encoded representation.
Deep Learning
FundamentalsA branch of machine learning based on multi-layer neural networks that can learn complex patterns from large amounts of data.
Dense Model
Models & ArchitectureA model where all parameters are active during every forward pass.
DevOps
Coding & ApplicationsPractices that combine software development and IT operations to shorten delivery cycles.
Diffusion Model
Models & ArchitectureA generative model that learns to reverse a noise-adding process to create images, audio, or video.
Domain Adaptation
Training & Fine-tuningAdapting a model to perform better on data from a specific domain or industry.
Dropout
FundamentalsA regularization technique that randomly disables neurons during training to prevent co-adaptation.
E
Edge Deployment
Pricing & PerformanceRunning models close to end users to reduce latency.
Embedding
RAG & KnowledgeAn embedding is a list of numbers, usually called a vector, that represents the meaning of a piece of data. Semantically similar items end up close together in this numeric space, which lets a computer compare meaning using distance rather than exact keyword matches. Embedding models are trained to produce these vectors. For text, the model reads a sentence or document and outputs a dense vector, often with hundreds or thousands of dimensions. You can then measure similarity with cosine similarity or Euclidean distance. Two sentences about payment processing will have embeddings closer to each other than a sentence about baseball, even if they share no common words. Embeddings power search, recommendations, clustering, and RAG. A typical RAG system stores document chunks as vectors in a vector database and retrieves the nearest neighbors to a user's query embedding. Embeddings can also represent images, audio, and other modalities when the model is trained on multimodal data.
Embedding Model
Models & ArchitectureA model that converts data into dense numerical vectors that capture semantic meaning.
Encoder
Models & ArchitectureA model component that processes input into a dense internal representation.
Encoder-Decoder
Models & ArchitectureAn architecture that first encodes input into a representation and then decodes it into output.
Entity
RAG & KnowledgeA distinct object or concept represented in a knowledge graph, such as a person, place, or product.
Epoch
FundamentalsOne complete pass through the entire training dataset during model training.
Evaluation
Evaluation & SafetyThe process of measuring a model's performance on tasks or benchmarks.
Explainability
Evaluation & SafetyThe degree to which a model's decisions can be understood by humans.
Extractive Summarization
Coding & ApplicationsCreating a summary by selecting existing sentences or phrases from the source.
F
F1 Score
Evaluation & SafetyThe harmonic mean of precision and recall.
Factuality
Evaluation & SafetyThe degree to which generated content is factually correct.
Fairness
Evaluation & SafetyThe property of a model treating different groups equitably.
Feature
Coding & ApplicationsAn individual measurable property or characteristic of data used by a model.
Feature Engineering
Coding & ApplicationsTransforming raw data into features that improve model performance.
Few-Shot Learning
Training & Fine-tuningLearning a task from only a few examples, often by including them in the prompt.
Few-Shot Prompting
Prompt EngineeringIncluding examples of desired input-output pairs in the prompt to guide the model.
Fine-tuning
Training & Fine-tuningFine-tuning is the process of further training a pre-trained model on a smaller, task-specific dataset so it becomes better at a particular job. The base model already knows grammar, facts, and reasoning from pre-training; fine-tuning teaches it the style, format, or domain you care about. Common reasons to fine-tune include matching a brand voice, classifying support tickets, extracting structured fields from documents, and improving performance on low-resource languages. You typically need hundreds to thousands of high-quality examples. Each example pairs an input with the desired output, and the model's weights are updated to reduce the error on those examples. Fine-tuning is not always the right first step. Prompt engineering, retrieval augmentation, and few-shot examples are faster and cheaper to iterate. Fine-tuning becomes worthwhile when the behavior you want is hard to describe in a prompt, must be consistent at scale, or needs to run without sending long examples every request. Techniques like LoRA and QLoRA make fine-tuning feasible on consumer hardware by updating only a small subset of weights.
First-Token Latency
Pricing & PerformanceThe time until the first token of a response is received.
Foundation Model
Models & ArchitectureA large model trained on broad data that can be adapted to many downstream tasks.
Function Calling
Prompt EngineeringA model capability to generate calls to external functions with structured arguments.
G
Generalization
FundamentalsA model's ability to perform well on new, unseen data rather than only memorizing training examples.
Generative AI
FundamentalsAI systems that create new content such as text, images, audio, or code based on learned patterns.
GGUF
Models & ArchitectureA binary format for storing quantized models for efficient local inference.
GitHub Copilot
Coding & ApplicationsAn AI pair programmer from GitHub that provides code suggestions across editors.
Gradient Descent
FundamentalsAn optimization algorithm that iteratively adjusts parameters to minimize a model's loss function.
GraphQL
MCP & ProtocolsA query language for APIs that allows clients to request exactly the data they need.
Ground Truth
Coding & ApplicationsThe accurate reference answer used to evaluate model predictions.
gRPC
MCP & ProtocolsA high-performance RPC framework that uses protocol buffers for service definitions.
Guardrails
Evaluation & SafetyControls that constrain model behavior to stay within acceptable boundaries.
H
Hallucination
Evaluation & SafetyWhen a model generates plausible-sounding but false or unsupported information.
Human-in-the-Loop
Agents & ToolsA design where humans review or approve agent actions at key decision points.
Hybrid Search
RAG & KnowledgeCombining vector similarity with keyword or structured filtering for retrieval.
Hyperparameter
FundamentalsA configuration value set before training begins, such as learning rate or batch size.
I
IDE
Coding & ApplicationsIntegrated Development Environment, a software application that provides tools for coding.
In-Context Learning
Prompt EngineeringA model's ability to learn a task from examples embedded directly in the prompt.
Inference
FundamentalsThe process of running a trained model on new input data to produce an output or prediction.
Ingestion Pipeline
RAG & KnowledgeThe process of loading, chunking, embedding, and storing documents for retrieval.
Input Token
Pricing & PerformanceA token counted from the prompt sent to a model.
Instruction Tuning
Training & Fine-tuningFine-tuning a model on instruction-following examples to improve its ability to respond to user requests.
Interpretability
Evaluation & SafetyThe study of understanding how models represent and process information internally.
J
K
Keyword Search
RAG & KnowledgeRetrieving documents based on exact or approximate word matches.
Knowledge Base
RAG & KnowledgeA structured repository of information an AI system can query or retrieve from.
Knowledge Graph
RAG & KnowledgeA network of entities and relationships used to represent structured knowledge.
KV Cache
Models & ArchitectureA cache of key and value tensors used to speed up autoregressive generation by avoiding redundant computation.
L
Label
Coding & ApplicationsThe correct output associated with a training example.
Large Language Model
Models & ArchitectureA neural network trained on vast text data to understand and generate human language.
Latency
Pricing & PerformanceThe delay between a request and the start of a response.
Learning Rate
FundamentalsA hyperparameter that controls how much model weights are updated during each training step.
llama.cpp
Pricing & PerformanceA C++ implementation for running Llama models efficiently on consumer hardware.
Load Balancing
Pricing & PerformanceDistributing requests across multiple servers to improve reliability and performance.
Local Model
Pricing & PerformanceA model that runs on local hardware without requiring cloud API calls.
Logging
Coding & ApplicationsRecording events and messages from software for analysis.
Long Context
Models & ArchitectureThe ability of a model to process very large context windows, often hundreds of thousands of tokens.
LoRA
Training & Fine-tuningLow-Rank Adaptation, a parameter-efficient fine-tuning method that updates small adapter matrices instead of all weights.
Loss Function
FundamentalsA function that measures how far a model's predictions are from the correct answers during training.
M
Machine Learning
FundamentalsA subset of AI where systems improve at tasks through experience and data without being explicitly programmed.
Matrix Factorization
Coding & ApplicationsA technique that decomposes user-item interaction matrices into latent factors.
Max Tokens
Prompt EngineeringThe maximum number of tokens a model is allowed to generate in a response.
MCP
MCP & ProtocolsMCP stands for Model Context Protocol. It is an open standard that lets AI clients connect to external tools, data sources, and prompts through a single, consistent interface. Anthropic introduced MCP in late 2024, and it has since been adopted by Claude Desktop, Cursor, Cline, VS Code, Windsurf, and a growing list of community clients. An MCP server is a small program that exposes three things: tools the model can call, resources the client can read, and prompts that help users accomplish common tasks. An MCP client discovers those capabilities and decides when to invoke them. Transport is usually stdio for local servers or Server-Sent Events for remote ones. For developers, MCP removes the need to build a custom integration for every API. You write one server, and any compatible client can use it. For users, it means AI assistants can securely access files, databases, SaaS tools, and web services without each client reinventing the wheel.
MCP Client
MCP & ProtocolsAn application that connects to MCP servers and uses their capabilities.
MCP Server
MCP & ProtocolsA program that exposes tools, resources, and prompts via the Model Context Protocol.
Mechanistic Interpretability
Evaluation & SafetyA research area that reverse-engineers neural networks to understand their internal circuits.
Metric
Evaluation & SafetyA quantitative measure of model performance, such as accuracy, F1, or BLEU.
Mixture of Experts
Models & ArchitectureAn architecture where only a subset of specialized sub-networks is activated per input, improving efficiency.
MLOps
Coding & ApplicationsPractices for deploying and maintaining machine learning models in production.
Model Context Protocol
MCP & ProtocolsAn open standard that lets AI assistants connect to external data sources and tools through a common interface.
Moderation
Evaluation & SafetyFiltering or flagging content that violates safety policies.
Monitoring
Coding & ApplicationsTracking system health, performance, and behavior over time.
Multi-Agent System
Agents & ToolsA system where multiple agents collaborate, compete, or delegate tasks to achieve complex goals.
Multi-Head Attention
Models & ArchitectureRunning multiple attention mechanisms in parallel to capture different kinds of relationships between tokens.
Multimodal Model
Models & ArchitectureA model that can process and generate multiple types of input, such as text, images, and audio.
N
Named Entity Recognition
Coding & ApplicationsIdentifying and classifying named entities such as people, organizations, and locations in text.
Natural Language Processing
FundamentalsThe field focused on enabling computers to understand, interpret, and generate human language.
Nearest Neighbor
RAG & KnowledgeFinding the closest data points to a query in a vector space.
Neural Network
FundamentalsA computational model inspired by biological neurons, organized in layers that process input data to produce outputs.
Normalization
Coding & ApplicationsScaling data to a standard range or distribution.
O
OAuth
MCP & ProtocolsAn authorization framework for delegated access to resources.
Observability
Coding & ApplicationsThe ability to understand internal system state from external outputs.
Ollama
Pricing & PerformanceA tool for running open-source models locally with simple commands.
On-Device
Pricing & PerformanceRunning a model locally on a user's device rather than on a remote server.
One-Shot
Training & Fine-tuningPerforming a task after seeing a single example.
ONNX
Models & ArchitectureAn open format for representing machine learning models, enabling cross-framework deployment.
OpenAPI
MCP & ProtocolsA specification format for describing HTTP APIs.
Orchestration
Agents & ToolsCoordinating multiple tools, agents, or services to complete a workflow.
Output Parsing
Prompt EngineeringExtracting structured data from model outputs, often using schemas or regular expressions.
Output Token
Pricing & PerformanceA token generated by the model in its response.
Overfitting
FundamentalsWhen a model learns training data too closely and performs poorly on unseen data.
Overlap
RAG & KnowledgeShared text between adjacent chunks to preserve context across chunk boundaries.
P
Parameter
FundamentalsA configurable internal variable in a neural network that is learned during training and determines model behavior.
Parameter-Efficient Fine-Tuning
Training & Fine-tuningMethods that adapt a pre-trained model to new tasks while updating only a small fraction of parameters.
Perplexity
Evaluation & SafetyA measure of how well a probability model predicts a sample; lower is better.
Persona
Prompt EngineeringA defined identity or character assigned to a model in a prompt.
Personalization
Coding & ApplicationsTailoring outputs or recommendations to individual users.
Pipeline
Agents & ToolsA linear sequence of data processing or model steps.
Planning
Agents & ToolsThe process of deciding which actions to take and in what order to achieve a goal.
Plugin
Agents & ToolsAn add-on module that extends a system's capabilities.
Pre-training
Training & Fine-tuningTraining a model on a large corpus to learn general language patterns before task-specific adaptation.
Precision
Evaluation & SafetyThe proportion of predicted positives that are actually correct.
Preference Model
Training & Fine-tuningA model that learns to rank outputs based on which ones humans prefer.
Preprocessing
Coding & ApplicationsCleaning and transforming raw data before it is fed to a model.
Prompt
Prompt EngineeringThe input text given to a language model to elicit a desired response.
Prompt Chaining
Prompt EngineeringBreaking a complex task into a sequence of prompts where each step uses the previous output.
Prompt Engineering
Prompt EngineeringPrompt engineering is the practice of crafting inputs to a language model so it produces better outputs without changing the model's weights. It covers word choice, structure, examples, constraints, and the order in which information appears. A well-engineered prompt can turn a mediocre response into a precise, actionable one. Effective prompts are usually clear, specific, and formatted. They state the task, define the audience, set the output format, and include any constraints. Adding examples, known as few-shot prompting, helps the model understand patterns that are hard to describe in words. Breaking complex tasks into steps, called chain-of-thought prompting, improves reasoning and arithmetic. Prompt engineering is iterative. You write a prompt, test it on diverse inputs, measure the results, and refine. Tools like the VePrompts Prompt Optimizer can surface issues such as ambiguity, missing constraints, or conflicting instructions. Good prompt engineering is often the fastest way to improve an AI feature before investing in fine-tuning or custom infrastructure.
Prompt Injection
Prompt EngineeringAn attack where malicious input overrides or leaks system instructions.
Prompt Template
MCP & ProtocolsA reusable prompt pattern provided by an MCP server for common tasks.
Protocol Buffers
MCP & ProtocolsA language-neutral binary serialization format developed by Google.
Q
Quality Assurance
Coding & ApplicationsProcesses designed to ensure products meet quality standards.
Quantization
Models & ArchitectureReducing the precision of model weights to decrease memory usage and increase inference speed.
Query Rewriting
RAG & KnowledgeTransforming a user query to improve retrieval, such as expanding acronyms or adding synonyms.
Quota
Pricing & PerformanceA maximum allowance of usage for an account or key.
R
RAG
Prompt EngineeringRAG stands for Retrieval-Augmented Generation. It is a pattern that gives a language model access to information outside its training data by fetching relevant documents at query time and including them in the prompt. Instead of memorizing facts, the model reasons over retrieved snippets, which makes answers more accurate, current, and traceable. A typical RAG pipeline has four stages. First, documents are split into chunks and converted into embeddings using an embedding model. Second, those embeddings are stored in a vector database. Third, when a user asks a question, the system embeds the query and searches the database for the closest chunks. Finally, the retrieved chunks are added to the prompt as context, and the model generates an answer grounded in that evidence. RAG is especially useful when answers depend on private data, such as internal wikis, support tickets, or product documentation. It also reduces hallucination because the model can cite the retrieved text. Teams often tune RAG by changing chunk size, overlap, reranking algorithms, and query rewriting strategies.
Rate Limit
Pricing & PerformanceA cap on the number of requests or tokens allowed in a time window.
Re-ranking
RAG & KnowledgeA second-stage model that scores and reorders retrieved documents for better relevance.
ReAct
Agents & ToolsA pattern where an agent Reasons and Acts in alternating steps to solve tasks.
Reasoning Model
Models & ArchitectureA model optimized for step-by-step logical reasoning and complex problem solving.
Recall
Evaluation & SafetyThe proportion of actual positives that were correctly identified.
Recommendation
Coding & ApplicationsSuggesting items or actions to users based on data.
Red Teaming
Evaluation & SafetyAttempting to find vulnerabilities, biases, or harmful behaviors in a model.
Reflection
Agents & ToolsAn agent evaluating its own output and revising it based on critique.
Regularization
FundamentalsTechniques used to reduce overfitting by discouraging overly complex models.
Reinforcement Learning
Coding & ApplicationsLearning by interacting with an environment and receiving rewards or penalties.
Reinforcement Learning from Human Feedback
Training & Fine-tuningTraining a model using human preference signals to make outputs more helpful and harmless.
Relationship
RAG & KnowledgeA connection between two entities in a knowledge graph.
Resource
MCP & ProtocolsRead-only data exposed by an MCP server that a client can pull into context.
REST
MCP & ProtocolsRepresentational State Transfer, an architectural style for designing networked APIs.
Retrieval-Augmented Generation
Prompt EngineeringGenerating responses grounded in retrieved external documents to improve accuracy and recency.
Reward Model
Training & Fine-tuningA model trained to score outputs according to human preferences, used in RLHF.
RLHF
Training & Fine-tuningShort for Reinforcement Learning from Human Feedback.
Robustness
Evaluation & SafetyA model's ability to maintain performance under noisy or adversarial inputs.
Role Prompting
Prompt EngineeringAsking the model to assume a specific role or persona to shape its responses.
ROUGE
Evaluation & SafetyA set of metrics for evaluating automatic summarization by comparing overlap with reference summaries.
S
Safety
Evaluation & SafetyPractices that reduce harmful, unethical, or dangerous model outputs.
Sandbox
Agents & ToolsAn isolated execution environment that limits what code can access.
Scaling
Pricing & PerformanceAdjusting compute resources to handle varying workloads.
Self-Attention
Models & ArchitectureAttention applied within a single sequence, allowing each token to relate to every other token.
Self-Correction
Agents & ToolsAn agent identifying and fixing its own mistakes.
Semantic Search
Coding & ApplicationsSearching by meaning rather than exact keyword matches, often using embeddings.
Sentiment Analysis
Coding & ApplicationsDetermining the emotional tone or opinion expressed in text.
Sequence-to-Sequence
Coding & ApplicationsA model architecture that maps an input sequence to an output sequence.
Similarity Search
RAG & KnowledgeFinding items with embeddings close to a query embedding, usually by cosine or Euclidean distance.
Skill
Agents & ToolsA specific capability advertised by an agent or service.
Sparse Model
Models & ArchitectureA model where most parameters are inactive for any given input, reducing compute per forward pass.
SSE
MCP & ProtocolsServer-Sent Events, an HTTP-based transport for streaming messages from server to client.
Standardization
Coding & ApplicationsTransforming data to have zero mean and unit variance.
STDIO
MCP & ProtocolsA transport that uses standard input and output for local MCP server communication.
Stop Sequence
Prompt EngineeringA string that signals the model to stop generating further tokens.
Structured Output
Prompt EngineeringRequiring the model to produce output conforming to a defined schema.
Summarization
Coding & ApplicationsProducing a shorter version of a longer text while preserving key information.
Supervised Learning
Coding & ApplicationsTraining a model on labeled input-output pairs.
Synthetic Data
Training & Fine-tuningData generated by models or simulations rather than collected from real-world sources.
System Prompt
Prompt EngineeringA system prompt is the high-level instruction that sets the model's role, tone, constraints, and behavior for a conversation. It is sent once at the start of the context and influences every response that follows. While users see the assistant's reply, they usually do not see the system prompt unless the application exposes it. A good system prompt is specific and scoped. Instead of saying you are helpful, it might say you are a senior React reviewer who gives concise feedback in bullet points, flags security issues, and never writes full code replacements. This reduces ambiguity and makes the model's output more consistent across sessions. System prompts are also the first line of defense for safety and product requirements. You can use them to enforce output formats, reject off-topic requests, require citations, or ask the model to disclose uncertainty. Because they carry so much influence, small changes to a system prompt often produce larger improvements than adding more examples to user messages.
T
Temperature
Prompt EngineeringTemperature is a sampling parameter that controls how random a language model's outputs are. It scales the logits, or raw scores, that the model assigns to each possible next token before a token is chosen. A lower temperature makes the model more conservative and deterministic; a higher temperature makes it more creative and varied. At temperature zero, the model almost always picks the highest-scoring token, which is ideal for tasks like code generation, factual answers, and structured output where consistency matters. At temperature one or above, the model is more willing to sample lower-scoring tokens, which can produce surprising phrasing, creative writing, and diverse brainstorming ideas. There is no universal best setting. Coding and data extraction usually benefit from low temperatures around 0.1 to 0.3. Marketing copy, fiction, and idea generation often feel better at 0.7 to 1.0. If outputs are too repetitive, raise the temperature. If they become erratic or off-topic, lower it.
Test-Time Compute
Models & ArchitectureSpending more computation during inference to improve output quality, such as through reasoning or search.
Testing
Coding & ApplicationsEvaluating software to ensure it behaves as expected.
Throttling
Pricing & PerformanceSlowing or limiting requests to enforce rate limits.
Throughput
Pricing & PerformanceThe number of tokens or requests processed per unit of time.
Token
Prompt EngineeringA token is the basic unit a language model reads and writes. It can be a whole word, part of a word, or even a single punctuation mark. Models do not see raw text; they see a sequence of token IDs produced by a tokenizer. Different models use different tokenizers, so the same sentence may cost a different number of tokens on GPT-4o than on Claude or Gemini. As a rough guide, one token is about four English characters or three quarters of a word. A 500-word article might be 700 to 900 tokens, depending on the tokenizer. Tokens matter for two reasons. First, pricing is per token, so longer prompts and outputs cost more. Second, models have a context window measured in tokens; if your input exceeds that limit, the model cannot process it. Tools like the VePrompts tokenizer show you exactly how a specific model splits your text.
Tokenization
Coding & ApplicationsSplitting text into tokens for model processing.
Tokenizer
Prompt EngineeringA tool that converts text into tokens for model processing.
Tokens Per Second
Pricing & PerformanceThe rate at which a model generates tokens after the first one.
Tool
Agents & ToolsAn external function an agent can call to perform an action or retrieve data.
Tool Use
Prompt EngineeringThe ability of a model to invoke external tools or APIs to complete tasks.
Top-k
Prompt EngineeringA sampling parameter that limits token selection to the k highest-probability candidates.
Top-p
Prompt EngineeringNucleus sampling parameter that limits token selection to the smallest set whose cumulative probability exceeds p.
Toxicity
Evaluation & SafetyThe presence of harmful, offensive, or abusive content in model outputs.
Tracing
Coding & ApplicationsRecording the path of a request through a system to diagnose issues.
Transfer Learning
Training & Fine-tuningLeveraging knowledge learned from one task to improve performance on a related task.
Transformer
Models & ArchitectureA neural network architecture using self-attention to process sequences in parallel, forming the basis of modern LLMs.
Translation
Coding & ApplicationsConverting text from one language to another.
Transport
MCP & ProtocolsThe communication channel between an MCP client and server, such as stdio or SSE.
Triple
RAG & KnowledgeA subject-predicate-object statement representing a fact in a knowledge graph.
Trust
Evaluation & SafetyConfidence that a model will behave reliably, safely, and as intended.
U
V
Vector Database
RAG & KnowledgeA database optimized for storing and searching high-dimensional embeddings.
Vision-Language Model
Models & ArchitectureA model that understands both images and text, often enabling image captioning or visual question answering.
Vocabulary
Prompt EngineeringThe set of tokens a model recognizes.
VS Code
Coding & ApplicationsA popular source-code editor developed by Microsoft.
W
Webhook
Agents & ToolsAn HTTP callback that triggers an action when an event occurs.
Weight
FundamentalsA numerical value in a neural network that scales the strength of connections between neurons.
Workflow
Agents & ToolsA predefined sequence of steps that may include models, tools, and conditional logic.