Skip to main content
VePrompts
GPT-4o Coding & Development

While optimized for GPT-4o, this prompt is compatible with most major AI models.

RAG Pipeline Architect

Design production-ready Retrieval-Augmented Generation pipelines with advanced chunking strategies, embedding optimization, and hybrid search capabilities for enterprise knowledge bases.

Share

Expert Note

This prompt enables the design of sophisticated RAG systems with modern techniques like hybrid search, reranking, and contextual chunking. Essential for building enterprise knowledge management systems.

Prompt Health: 100%

Length
Structure
Variables
Est. 874 tokens
# Role You are a senior AI Engineer specializing in Retrieval-Augmented Generation (RAG) systems. You have deep expertise in vector databases, embedding models, chunking strategies, and information retrieval optimization. ## Task Design a production-grade RAG pipeline for [KNOWLEDGE_DOMAIN] that efficiently retrieves relevant information and generates accurate, contextually grounded responses. ## RAG Architecture Components ### 1. Document Processing Pipeline ``` Ingestion Flow: Raw Documents → Preprocessing → Chunking → Embedding → Indexing → Storage ``` **Chunking Strategies to Consider:** - **Semantic Chunking**: Split at natural boundaries (paragraphs, sections) - **Fixed-size with Overlap**: Consistent chunk sizes with context overlap - **Agentic Chunking**: LLM-based intelligent splitting - **Hierarchical Chunking**: Parent-child relationships between chunks ### 2. Embedding Strategy ``` Embedding Selection Matrix: ├── Model: [text-embedding-3-large, voyage-2, etc.] ├── Dimensions: [1536, 768, etc.] ├── Context Window: [8192, etc.] ├── Normalization: [L2, none] └── Batch Size: [optimization parameter] ``` ### 3. Retrieval Architecture **Hybrid Search Implementation:** - **Dense Retrieval**: Vector similarity search - **Sparse Retrieval**: BM25/TF-IDF keyword matching - **Fusion Strategy**: Reciprocal Rank Fusion (RRF) or linear combination - **Reranking**: Cross-encoder reranking for precision ### 4. Query Processing ``` Query Pipeline: User Query → Query Expansion → Intent Classification → Retrieval Strategy Selection → Multi-hop Reasoning (if needed) ``` ## Advanced Features to Implement 1. **Contextual Compression**: Summarize retrieved chunks to fit context window 2. **Query Rewriting**: Transform vague queries for better retrieval 3. **Source Attribution**: Track and cite information sources 4. **Confidence Scoring**: Estimate answer reliability 5. **Multi-modal Support**: Handle images, tables, and text ## Technical Specifications ### Vector Database Selection Compare and select from: - **Pinecone**: Managed, scalable, metadata filtering - **Weaviate**: GraphQL interface, modular AI - **Chroma**: Open-source, easy prototyping - **Qdrant**: Rust-based, high performance - **pgvector**: PostgreSQL extension, ACID compliance ### Performance Optimization ``` Optimization Checklist: □ Index configuration (HNSW, IVFFlat) □ Caching strategy (query cache, embedding cache) □ Batching for embedding generation □ Async retrieval operations □ Connection pooling ``` ## Evaluation Framework Design evaluation metrics: 1. **Retrieval Metrics**: - Hit Rate @ k - Mean Reciprocal Rank (MRR) - Normalized Discounted Cumulative Gain (NDCG) 2. **Generation Metrics**: - Answer relevance - Faithfulness to sources - Completeness 3. **End-to-End Metrics**: - Latency (p50, p95, p99) - Throughput - Cost per query ## Implementation Template Provide: 1. **Architecture Diagram**: Visual representation 2. **Data Flow**: Step-by-step processing 3. **Code Structure**: Modular Python implementation 4. **Configuration**: Environment-specific settings 5. **Monitoring**: Observability and alerting 6. **Scaling Strategy**: Horizontal scaling approach ## Variables - **KNOWLEDGE_DOMAIN**: Target domain (e.g., "legal documents", "medical research", "technical documentation") - **DOCUMENT_TYPES**: Types of documents (PDFs, HTML, structured data) - **SCALE_REQUIREMENTS**: Expected query volume and data size

Private Notes

Insert Into Your AI

Edit the prompt above then feed it directly to your favorite AI model

Clicking opens the AI in a new tab. Content is also copied to clipboard for backup.

Explore Related Resources

RAG Implementation Expert

Skill

Build production-grade Retrieval-Augmented Generation systems with vector databases, embeddings, and hybrid search.

DeepSeek Coder Architect

Prompt

Leverage DeepSeek Coder for complex software architecture, code generation, and technical problem-solving with advanced reasoning.

MODULAR RAG MCP SERVER

MCP Server

A modular RAG (Retrieval-Augmented Generation) system with MCP Server architecture. Using Skill to make AI follow each step of the spec and complete the code 100% by AI.

RAG

Glossary

RAG stands for Retrieval-Augmented Generation. It is a pattern that gives a language model access to information outside its training data by fetching relevant documents at query time and including them in the prompt. Instead of memorizing facts, the model reasons over retrieved snippets, which makes answers more accurate, current, and traceable. A typical RAG pipeline has four stages. First, documents are split into chunks and converted into embeddings using an embedding model. Second, those embeddings are stored in a vector database. Third, when a user asks a question, the system embeds the query and searches the database for the closest chunks. Finally, the retrieved chunks are added to the prompt as context, and the model generates an answer grounded in that evidence. RAG is especially useful when answers depend on private data, such as internal wikis, support tickets, or product documentation. It also reduces hallucination because the model can cite the retrieved text. Teams often tune RAG by changing chunk size, overlap, reranking algorithms, and query rewriting strategies.

Vertical Farm Designer

Prompt

Design vertical farming systems optimizing lighting, climate, hydroponics, and automation for urban food production.