Evaluation & Safety
Evaluation
The process of measuring a model's performance on tasks or benchmarks.
Published 2026-06-12
Explore the glossary
Find definitions for AI, LLM, MCP, RAG, agent, and prompt engineering terms.
Browse all termsRelated Resources
Benchmark
GlossaryA standardized dataset and task used to compare models.
AI Model Evaluation Framework
PromptDesign comprehensive benchmarking protocols for evaluating AI models across multiple dimensions including reasoning, creativity, coding, and safety with reproducible methodologies.
3D Printing Optimizer
SkillOptimize 3D models for additive manufacturing considering orientation, supports, infill, and material properties.
Firecrawl
MCP ServerOfficial Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM clients.
Metric
GlossaryA quantitative measure of model performance, such as accuracy, F1, or BLEU.