Skip to main content
VePrompts
RAG & Knowledge

Ingestion Pipeline

The process of loading, chunking, embedding, and storing documents for retrieval.

Published 2026-06-12

Related terms

Explore the glossary

Find definitions for AI, LLM, MCP, RAG, agent, and prompt engineering terms.

Browse all terms

Related Resources

Vector Database

Glossary

A database optimized for storing and searching high-dimensional embeddings.

DeepSeek Coder Architect

Prompt

Leverage DeepSeek Coder for complex software architecture, code generation, and technical problem-solving with advanced reasoning.

3D Printing Optimizer

Skill

Optimize 3D models for additive manufacturing considering orientation, supports, infill, and material properties.

Firecrawl

MCP Server

Official Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM clients.

Embedding

Glossary

An embedding is a list of numbers, usually called a vector, that represents the meaning of a piece of data. Semantically similar items end up close together in this numeric space, which lets a computer compare meaning using distance rather than exact keyword matches. Embedding models are trained to produce these vectors. For text, the model reads a sentence or document and outputs a dense vector, often with hundreds or thousands of dimensions. You can then measure similarity with cosine similarity or Euclidean distance. Two sentences about payment processing will have embeddings closer to each other than a sentence about baseball, even if they share no common words. Embeddings power search, recommendations, clustering, and RAG. A typical RAG system stores document chunks as vectors in a vector database and retrieves the nearest neighbors to a user's query embedding. Embeddings can also represent images, audio, and other modalities when the model is trained on multimodal data.