Skip to main content
VePrompts

AI & LLM Glossary

Plain-English definitions for 248+ AI, LLM, MCP, RAG, and agent terms. Browse by category or search the encyclopedia.

248 terms

A

A2A

Agents & Tools

Agent-to-Agent protocol for agents to discover, negotiate, and delegate tasks.

Abstractive Summarization

Coding & Applications

Generating a summary using new phrasing rather than extracting sentences.

Accuracy

Evaluation & Safety

The proportion of correct predictions out of total predictions.

Activation Function

Fundamentals

A mathematical function applied to a neuron's output, introducing non-linearity so the network can learn complex patterns.

Adapter

Training & Fine-tuning

A small neural module inserted into a pre-trained model and trained for a specific task.

Adversarial Attack

Evaluation & Safety

An input designed to trick a model into making a mistake.

Agent

Agents & Tools

An AI agent is a system that uses a language model to perceive its environment, make decisions, and take actions to reach a goal. Unlike a simple chatbot that only responds to prompts, an agent can loop: observe state, plan next steps, call tools, review results, and adapt until the task is done. Agents are built from several components. A planner breaks a goal into subtasks. A memory module stores conversation history and working context. A tool interface lets the agent call APIs, run code, query databases, or interact with other systems. A feedback loop checks whether each step moved the agent closer to the goal. Simple agents might answer a question by searching the web. Complex agents can write and test code, file pull requests, or coordinate with other agents. The more autonomy an agent has, the more important safety guardrails become, such as human approval for destructive actions and clear logging for every decision.

Agent Card

Agents & Tools

A manifest describing an A2A agent's capabilities, endpoint, and authentication requirements.

AI Coding Assistant

Coding & Applications

A tool that helps write, review, or debug code using AI.

Alignment

Training & Fine-tuning

The process of ensuring a model behaves in ways consistent with human values and intentions.

Annotation

Coding & Applications

The process of adding labels or metadata to training data.

API

Agents & Tools

Application Programming Interface, a set of rules for software components to communicate.

API Pricing

Pricing & Performance

The cost structure for using a model or service via API, usually per input and output tokens.

Artificial General Intelligence

Fundamentals

Hypothetical AI that can understand, learn, and perform any intellectual task a human can do across any domain.

Artificial Intelligence

Fundamentals

The broad field of creating machines that can perform tasks requiring human-like intelligence, such as reasoning, learning, and perception.

Assistant

Coding & Applications

An AI application that helps users complete tasks through conversation.

Attention

Models & Architecture

A mechanism that lets a model focus on relevant parts of input when producing each output token.

Autonomy

Agents & Tools

The degree to which an agent can operate without human intervention.

Autoregressive

Models & Architecture

Generating output one token at a time, using previously generated tokens as context for the next.

B

C

Caching

Pricing & Performance

Storing and reusing previous results to reduce latency and cost.

CDN

Pricing & Performance

Content Delivery Network, a geographically distributed network that speeds up content delivery.

Chain-of-Thought

Prompt Engineering

Prompting a model to show its reasoning step by step before giving a final answer.

Chatbot

Coding & Applications

A conversational interface that uses AI to interact with users.

Chunking

RAG & Knowledge

Splitting documents into smaller pieces before embedding and storing them.

CI/CD

Coding & Applications

Continuous Integration and Continuous Deployment, automated pipelines for building and releasing software.

Class Imbalance

Training & Fine-tuning

When some classes or outcomes are far more common than others in training data.

Classification

Coding & Applications

Assigning input data to predefined categories.

Claude Code

Coding & Applications

An agentic terminal coding tool powered by Claude.

Clustering

Coding & Applications

Grouping data points into clusters based on similarity.

Code Execution

Agents & Tools

Running code generated by a model, usually in a controlled environment.

Cold Start

Coding & Applications

Difficulty in making predictions for new users or items with little historical data.

Collaborative Filtering

Coding & Applications

Making recommendations based on patterns across many users.

Completion

Prompt Engineering

The text generated by a model in response to a prompt.

Constitutional AI

Training & Fine-tuning

A training approach where models critique and revise their own outputs according to a set of principles.

Content Filter

Evaluation & Safety

A system that blocks or flags disallowed content.

Content-Based Filtering

Coding & Applications

Making recommendations based on attributes of items a user has liked.

Context Window

Models & Architecture

The context window is the maximum number of tokens a model can consider in a single forward pass. It includes the system prompt, user messages, retrieved documents, and the model's own generated output. If the total exceeds the window, the oldest tokens are dropped or the request fails. Context windows vary widely. Small models may handle 4,000 tokens, while frontier models can process 128,000, 1,000,000, or even 10,000,000 tokens. Long context is useful for summarizing books, analyzing large codebases, and holding extended conversations without losing earlier details. A larger window does not always mean better results. Very long inputs can dilute attention, making the model miss important details. Techniques like RAG, selective summarization, and hierarchical chunking help fit the most relevant information into the window without exceeding the limit.

Continual Pre-training

Training & Fine-tuning

Further pre-training a model on additional domain-specific data before fine-tuning.

Conversational AI

Coding & Applications

AI systems designed for natural language dialogue.

Copilot

Coding & Applications

An AI assistant embedded in a workflow to augment human work.

Cost Per Million Tokens

Pricing & Performance

A common pricing unit for API-based language models.

Curriculum Learning

Training & Fine-tuning

Training a model on easier examples first and gradually increasing difficulty.

Cursor

Coding & Applications

An AI-native code editor built on VS Code with strong agentic features.

D

E

Edge Deployment

Pricing & Performance

Running models close to end users to reduce latency.

Embedding

RAG & Knowledge

An embedding is a list of numbers, usually called a vector, that represents the meaning of a piece of data. Semantically similar items end up close together in this numeric space, which lets a computer compare meaning using distance rather than exact keyword matches. Embedding models are trained to produce these vectors. For text, the model reads a sentence or document and outputs a dense vector, often with hundreds or thousands of dimensions. You can then measure similarity with cosine similarity or Euclidean distance. Two sentences about payment processing will have embeddings closer to each other than a sentence about baseball, even if they share no common words. Embeddings power search, recommendations, clustering, and RAG. A typical RAG system stores document chunks as vectors in a vector database and retrieves the nearest neighbors to a user's query embedding. Embeddings can also represent images, audio, and other modalities when the model is trained on multimodal data.

Embedding Model

Models & Architecture

A model that converts data into dense numerical vectors that capture semantic meaning.

Encoder

Models & Architecture

A model component that processes input into a dense internal representation.

Encoder-Decoder

Models & Architecture

An architecture that first encodes input into a representation and then decodes it into output.

Entity

RAG & Knowledge

A distinct object or concept represented in a knowledge graph, such as a person, place, or product.

Epoch

Fundamentals

One complete pass through the entire training dataset during model training.

Evaluation

Evaluation & Safety

The process of measuring a model's performance on tasks or benchmarks.

Explainability

Evaluation & Safety

The degree to which a model's decisions can be understood by humans.

Extractive Summarization

Coding & Applications

Creating a summary by selecting existing sentences or phrases from the source.

F

F1 Score

Evaluation & Safety

The harmonic mean of precision and recall.

Factuality

Evaluation & Safety

The degree to which generated content is factually correct.

Fairness

Evaluation & Safety

The property of a model treating different groups equitably.

Feature

Coding & Applications

An individual measurable property or characteristic of data used by a model.

Feature Engineering

Coding & Applications

Transforming raw data into features that improve model performance.

Few-Shot Learning

Training & Fine-tuning

Learning a task from only a few examples, often by including them in the prompt.

Few-Shot Prompting

Prompt Engineering

Including examples of desired input-output pairs in the prompt to guide the model.

Fine-tuning

Training & Fine-tuning

Fine-tuning is the process of further training a pre-trained model on a smaller, task-specific dataset so it becomes better at a particular job. The base model already knows grammar, facts, and reasoning from pre-training; fine-tuning teaches it the style, format, or domain you care about. Common reasons to fine-tune include matching a brand voice, classifying support tickets, extracting structured fields from documents, and improving performance on low-resource languages. You typically need hundreds to thousands of high-quality examples. Each example pairs an input with the desired output, and the model's weights are updated to reduce the error on those examples. Fine-tuning is not always the right first step. Prompt engineering, retrieval augmentation, and few-shot examples are faster and cheaper to iterate. Fine-tuning becomes worthwhile when the behavior you want is hard to describe in a prompt, must be consistent at scale, or needs to run without sending long examples every request. Techniques like LoRA and QLoRA make fine-tuning feasible on consumer hardware by updating only a small subset of weights.

First-Token Latency

Pricing & Performance

The time until the first token of a response is received.

Foundation Model

Models & Architecture

A large model trained on broad data that can be adapted to many downstream tasks.

Function Calling

Prompt Engineering

A model capability to generate calls to external functions with structured arguments.

G

H

I

J

K

L

M

Machine Learning

Fundamentals

A subset of AI where systems improve at tasks through experience and data without being explicitly programmed.

Matrix Factorization

Coding & Applications

A technique that decomposes user-item interaction matrices into latent factors.

Max Tokens

Prompt Engineering

The maximum number of tokens a model is allowed to generate in a response.

MCP

MCP & Protocols

MCP stands for Model Context Protocol. It is an open standard that lets AI clients connect to external tools, data sources, and prompts through a single, consistent interface. Anthropic introduced MCP in late 2024, and it has since been adopted by Claude Desktop, Cursor, Cline, VS Code, Windsurf, and a growing list of community clients. An MCP server is a small program that exposes three things: tools the model can call, resources the client can read, and prompts that help users accomplish common tasks. An MCP client discovers those capabilities and decides when to invoke them. Transport is usually stdio for local servers or Server-Sent Events for remote ones. For developers, MCP removes the need to build a custom integration for every API. You write one server, and any compatible client can use it. For users, it means AI assistants can securely access files, databases, SaaS tools, and web services without each client reinventing the wheel.

MCP Client

MCP & Protocols

An application that connects to MCP servers and uses their capabilities.

MCP Server

MCP & Protocols

A program that exposes tools, resources, and prompts via the Model Context Protocol.

Mechanistic Interpretability

Evaluation & Safety

A research area that reverse-engineers neural networks to understand their internal circuits.

Metric

Evaluation & Safety

A quantitative measure of model performance, such as accuracy, F1, or BLEU.

Mixture of Experts

Models & Architecture

An architecture where only a subset of specialized sub-networks is activated per input, improving efficiency.

MLOps

Coding & Applications

Practices for deploying and maintaining machine learning models in production.

Model Context Protocol

MCP & Protocols

An open standard that lets AI assistants connect to external data sources and tools through a common interface.

Moderation

Evaluation & Safety

Filtering or flagging content that violates safety policies.

Monitoring

Coding & Applications

Tracking system health, performance, and behavior over time.

Multi-Agent System

Agents & Tools

A system where multiple agents collaborate, compete, or delegate tasks to achieve complex goals.

Multi-Head Attention

Models & Architecture

Running multiple attention mechanisms in parallel to capture different kinds of relationships between tokens.

Multimodal Model

Models & Architecture

A model that can process and generate multiple types of input, such as text, images, and audio.

N

O

P

Parameter

Fundamentals

A configurable internal variable in a neural network that is learned during training and determines model behavior.

Parameter-Efficient Fine-Tuning

Training & Fine-tuning

Methods that adapt a pre-trained model to new tasks while updating only a small fraction of parameters.

Perplexity

Evaluation & Safety

A measure of how well a probability model predicts a sample; lower is better.

Persona

Prompt Engineering

A defined identity or character assigned to a model in a prompt.

Personalization

Coding & Applications

Tailoring outputs or recommendations to individual users.

Pipeline

Agents & Tools

A linear sequence of data processing or model steps.

Planning

Agents & Tools

The process of deciding which actions to take and in what order to achieve a goal.

Plugin

Agents & Tools

An add-on module that extends a system's capabilities.

Pre-training

Training & Fine-tuning

Training a model on a large corpus to learn general language patterns before task-specific adaptation.

Precision

Evaluation & Safety

The proportion of predicted positives that are actually correct.

Preference Model

Training & Fine-tuning

A model that learns to rank outputs based on which ones humans prefer.

Preprocessing

Coding & Applications

Cleaning and transforming raw data before it is fed to a model.

Prompt

Prompt Engineering

The input text given to a language model to elicit a desired response.

Prompt Chaining

Prompt Engineering

Breaking a complex task into a sequence of prompts where each step uses the previous output.

Prompt Engineering

Prompt Engineering

Prompt engineering is the practice of crafting inputs to a language model so it produces better outputs without changing the model's weights. It covers word choice, structure, examples, constraints, and the order in which information appears. A well-engineered prompt can turn a mediocre response into a precise, actionable one. Effective prompts are usually clear, specific, and formatted. They state the task, define the audience, set the output format, and include any constraints. Adding examples, known as few-shot prompting, helps the model understand patterns that are hard to describe in words. Breaking complex tasks into steps, called chain-of-thought prompting, improves reasoning and arithmetic. Prompt engineering is iterative. You write a prompt, test it on diverse inputs, measure the results, and refine. Tools like the VePrompts Prompt Optimizer can surface issues such as ambiguity, missing constraints, or conflicting instructions. Good prompt engineering is often the fastest way to improve an AI feature before investing in fine-tuning or custom infrastructure.

Prompt Injection

Prompt Engineering

An attack where malicious input overrides or leaks system instructions.

Prompt Template

MCP & Protocols

A reusable prompt pattern provided by an MCP server for common tasks.

Protocol Buffers

MCP & Protocols

A language-neutral binary serialization format developed by Google.

Q

R

RAG

Prompt Engineering

RAG stands for Retrieval-Augmented Generation. It is a pattern that gives a language model access to information outside its training data by fetching relevant documents at query time and including them in the prompt. Instead of memorizing facts, the model reasons over retrieved snippets, which makes answers more accurate, current, and traceable. A typical RAG pipeline has four stages. First, documents are split into chunks and converted into embeddings using an embedding model. Second, those embeddings are stored in a vector database. Third, when a user asks a question, the system embeds the query and searches the database for the closest chunks. Finally, the retrieved chunks are added to the prompt as context, and the model generates an answer grounded in that evidence. RAG is especially useful when answers depend on private data, such as internal wikis, support tickets, or product documentation. It also reduces hallucination because the model can cite the retrieved text. Teams often tune RAG by changing chunk size, overlap, reranking algorithms, and query rewriting strategies.

Rate Limit

Pricing & Performance

A cap on the number of requests or tokens allowed in a time window.

Re-ranking

RAG & Knowledge

A second-stage model that scores and reorders retrieved documents for better relevance.

ReAct

Agents & Tools

A pattern where an agent Reasons and Acts in alternating steps to solve tasks.

Reasoning Model

Models & Architecture

A model optimized for step-by-step logical reasoning and complex problem solving.

Recall

Evaluation & Safety

The proportion of actual positives that were correctly identified.

Recommendation

Coding & Applications

Suggesting items or actions to users based on data.

Red Teaming

Evaluation & Safety

Attempting to find vulnerabilities, biases, or harmful behaviors in a model.

Reflection

Agents & Tools

An agent evaluating its own output and revising it based on critique.

Regularization

Fundamentals

Techniques used to reduce overfitting by discouraging overly complex models.

Reinforcement Learning

Coding & Applications

Learning by interacting with an environment and receiving rewards or penalties.

Reinforcement Learning from Human Feedback

Training & Fine-tuning

Training a model using human preference signals to make outputs more helpful and harmless.

Relationship

RAG & Knowledge

A connection between two entities in a knowledge graph.

Resource

MCP & Protocols

Read-only data exposed by an MCP server that a client can pull into context.

REST

MCP & Protocols

Representational State Transfer, an architectural style for designing networked APIs.

Retrieval-Augmented Generation

Prompt Engineering

Generating responses grounded in retrieved external documents to improve accuracy and recency.

Reward Model

Training & Fine-tuning

A model trained to score outputs according to human preferences, used in RLHF.

RLHF

Training & Fine-tuning

Short for Reinforcement Learning from Human Feedback.

Robustness

Evaluation & Safety

A model's ability to maintain performance under noisy or adversarial inputs.

Role Prompting

Prompt Engineering

Asking the model to assume a specific role or persona to shape its responses.

ROUGE

Evaluation & Safety

A set of metrics for evaluating automatic summarization by comparing overlap with reference summaries.

S

Safety

Evaluation & Safety

Practices that reduce harmful, unethical, or dangerous model outputs.

Sandbox

Agents & Tools

An isolated execution environment that limits what code can access.

Scaling

Pricing & Performance

Adjusting compute resources to handle varying workloads.

Self-Attention

Models & Architecture

Attention applied within a single sequence, allowing each token to relate to every other token.

Self-Correction

Agents & Tools

An agent identifying and fixing its own mistakes.

Semantic Search

Coding & Applications

Searching by meaning rather than exact keyword matches, often using embeddings.

Sentiment Analysis

Coding & Applications

Determining the emotional tone or opinion expressed in text.

Sequence-to-Sequence

Coding & Applications

A model architecture that maps an input sequence to an output sequence.

Similarity Search

RAG & Knowledge

Finding items with embeddings close to a query embedding, usually by cosine or Euclidean distance.

Skill

Agents & Tools

A specific capability advertised by an agent or service.

Sparse Model

Models & Architecture

A model where most parameters are inactive for any given input, reducing compute per forward pass.

SSE

MCP & Protocols

Server-Sent Events, an HTTP-based transport for streaming messages from server to client.

Standardization

Coding & Applications

Transforming data to have zero mean and unit variance.

STDIO

MCP & Protocols

A transport that uses standard input and output for local MCP server communication.

Stop Sequence

Prompt Engineering

A string that signals the model to stop generating further tokens.

Structured Output

Prompt Engineering

Requiring the model to produce output conforming to a defined schema.

Summarization

Coding & Applications

Producing a shorter version of a longer text while preserving key information.

Supervised Learning

Coding & Applications

Training a model on labeled input-output pairs.

Synthetic Data

Training & Fine-tuning

Data generated by models or simulations rather than collected from real-world sources.

System Prompt

Prompt Engineering

A system prompt is the high-level instruction that sets the model's role, tone, constraints, and behavior for a conversation. It is sent once at the start of the context and influences every response that follows. While users see the assistant's reply, they usually do not see the system prompt unless the application exposes it. A good system prompt is specific and scoped. Instead of saying you are helpful, it might say you are a senior React reviewer who gives concise feedback in bullet points, flags security issues, and never writes full code replacements. This reduces ambiguity and makes the model's output more consistent across sessions. System prompts are also the first line of defense for safety and product requirements. You can use them to enforce output formats, reject off-topic requests, require citations, or ask the model to disclose uncertainty. Because they carry so much influence, small changes to a system prompt often produce larger improvements than adding more examples to user messages.

T

Temperature

Prompt Engineering

Temperature is a sampling parameter that controls how random a language model's outputs are. It scales the logits, or raw scores, that the model assigns to each possible next token before a token is chosen. A lower temperature makes the model more conservative and deterministic; a higher temperature makes it more creative and varied. At temperature zero, the model almost always picks the highest-scoring token, which is ideal for tasks like code generation, factual answers, and structured output where consistency matters. At temperature one or above, the model is more willing to sample lower-scoring tokens, which can produce surprising phrasing, creative writing, and diverse brainstorming ideas. There is no universal best setting. Coding and data extraction usually benefit from low temperatures around 0.1 to 0.3. Marketing copy, fiction, and idea generation often feel better at 0.7 to 1.0. If outputs are too repetitive, raise the temperature. If they become erratic or off-topic, lower it.

Test-Time Compute

Models & Architecture

Spending more computation during inference to improve output quality, such as through reasoning or search.

Testing

Coding & Applications

Evaluating software to ensure it behaves as expected.

Throttling

Pricing & Performance

Slowing or limiting requests to enforce rate limits.

Throughput

Pricing & Performance

The number of tokens or requests processed per unit of time.

Token

Prompt Engineering

A token is the basic unit a language model reads and writes. It can be a whole word, part of a word, or even a single punctuation mark. Models do not see raw text; they see a sequence of token IDs produced by a tokenizer. Different models use different tokenizers, so the same sentence may cost a different number of tokens on GPT-4o than on Claude or Gemini. As a rough guide, one token is about four English characters or three quarters of a word. A 500-word article might be 700 to 900 tokens, depending on the tokenizer. Tokens matter for two reasons. First, pricing is per token, so longer prompts and outputs cost more. Second, models have a context window measured in tokens; if your input exceeds that limit, the model cannot process it. Tools like the VePrompts tokenizer show you exactly how a specific model splits your text.

Tokenization

Coding & Applications

Splitting text into tokens for model processing.

Tokenizer

Prompt Engineering

A tool that converts text into tokens for model processing.

Tokens Per Second

Pricing & Performance

The rate at which a model generates tokens after the first one.

Tool

Agents & Tools

An external function an agent can call to perform an action or retrieve data.

Tool Use

Prompt Engineering

The ability of a model to invoke external tools or APIs to complete tasks.

Top-k

Prompt Engineering

A sampling parameter that limits token selection to the k highest-probability candidates.

Top-p

Prompt Engineering

Nucleus sampling parameter that limits token selection to the smallest set whose cumulative probability exceeds p.

Toxicity

Evaluation & Safety

The presence of harmful, offensive, or abusive content in model outputs.

Tracing

Coding & Applications

Recording the path of a request through a system to diagnose issues.

Transfer Learning

Training & Fine-tuning

Leveraging knowledge learned from one task to improve performance on a related task.

Transformer

Models & Architecture

A neural network architecture using self-attention to process sequences in parallel, forming the basis of modern LLMs.

Translation

Coding & Applications

Converting text from one language to another.

Transport

MCP & Protocols

The communication channel between an MCP client and server, such as stdio or SSE.

Triple

RAG & Knowledge

A subject-predicate-object statement representing a fact in a knowledge graph.

Trust

Evaluation & Safety

Confidence that a model will behave reliably, safely, and as intended.

U

V

W

Z