Capability · Framework — evals

TruLens

TruLens instruments your LLM app, records every call as a trace, and scores each trace with pluggable feedback functions (groundedness, answer relevance, context relevance, toxicity, custom). The local Streamlit-style dashboard makes it easy to diff experiments and regression-check a RAG pipeline.

Framework facts

Category: evals
Language: Python
License: MIT
Repository: https://github.com/truera/trulens

Install

pip install trulens

Quickstart

from trulens.core import TruSession
from trulens.apps.custom import TruCustomApp
from trulens.providers.openai import OpenAI as TruOpenAI
from trulens.core.feedback import Feedback

provider = TruOpenAI()
relevance = Feedback(provider.relevance).on_input_output()

session = TruSession()
tru_app = TruCustomApp(my_app, app_id='demo', feedbacks=[relevance])
with tru_app as rec:
    my_app.query('What is VSET?')
session.get_leaderboard()

Alternatives

Ragas — focused on RAG scoring
Promptfoo — YAML eval sweeps
DeepEval — pytest-style assertions
Arize Phoenix — open observability + evals

Frequently asked questions

What is the RAG Triad?

A popular TruLens eval: (1) context relevance of retrieved chunks, (2) groundedness of the answer in those chunks, (3) relevance of the answer to the question. Scoring all three catches most RAG regressions.

Does TruLens need a backend?

No. It writes traces to SQLite by default and serves a local dashboard. Snowflake users can point it at Cortex for scale.

Sources

TruLens — GitHub — accessed 2026-04-20
TruLens — docs — accessed 2026-04-20