Curiosity · Concept

RAGAS Metrics

RAGAS (Retrieval Augmented Generation Assessment), introduced by Es et al. in 2023, is the de-facto default for evaluating RAG systems when you do not have reference answers. It uses an LLM judge to compute four primary metrics: faithfulness (does the answer only use facts from retrieved context?), answer relevance (does the answer actually address the question?), context precision (are retrieved chunks relevant?), and context recall (was all necessary context retrieved?). Modern RAGAS versions add more, but these four remain the core.

Quick reference

Proficiency
Intermediate
Also known as
RAGAS, RAG Assessment
Prerequisites
Retrieval-Augmented Generation, LLM-as-judge

Frequently asked questions

What is RAGAS?

RAGAS is an open-source framework for evaluating retrieval-augmented generation pipelines. It uses an LLM judge to score outputs along interpretable dimensions like faithfulness and context precision.

What do the four core metrics mean?

Faithfulness: every claim in the answer is supported by retrieved context. Answer relevance: the answer addresses the question, not just related topics. Context precision: the retrieved chunks are actually useful. Context recall: together they contain enough information to answer.

Do I need ground-truth answers to use RAGAS?

Faithfulness, answer relevance, and context precision can be computed with just the query, retrieved context, and generated answer. Context recall usually requires a reference answer or gold context, so a hybrid of reference-free and reference-based is common.

What are the known limitations?

RAGAS relies on LLM judges, so it inherits their biases and calibration issues. Scores drift when you swap judge models. Best practice is to pin the judge model, calibrate against a human-labelled subset, and track trends rather than absolute values.

Sources

  1. Es et al. — RAGAS: Automated Evaluation of Retrieval Augmented Generation — accessed 2026-04-20
  2. RAGAS — Documentation — accessed 2026-04-20