Capability · Framework — evals

TruLens

TruLens instruments your LLM app, records every call as a trace, and scores each trace with pluggable feedback functions (groundedness, answer relevance, context relevance, toxicity, custom). The local Streamlit-style dashboard makes it easy to diff experiments and regression-check a RAG pipeline.

Framework facts

Category
evals
Language
Python
License
MIT
Repository
https://github.com/truera/trulens

Install

pip install trulens

Quickstart

from trulens.core import TruSession
from trulens.apps.custom import TruCustomApp
from trulens.providers.openai import OpenAI as TruOpenAI
from trulens.core.feedback import Feedback

provider = TruOpenAI()
relevance = Feedback(provider.relevance).on_input_output()

session = TruSession()
tru_app = TruCustomApp(my_app, app_id='demo', feedbacks=[relevance])
with tru_app as rec:
    my_app.query('What is VSET?')
session.get_leaderboard()

Alternatives

  • Ragas — focused on RAG scoring
  • Promptfoo — YAML eval sweeps
  • DeepEval — pytest-style assertions
  • Arize Phoenix — open observability + evals

Frequently asked questions

What is the RAG Triad?

A popular TruLens eval: (1) context relevance of retrieved chunks, (2) groundedness of the answer in those chunks, (3) relevance of the answer to the question. Scoring all three catches most RAG regressions.

Does TruLens need a backend?

No. It writes traces to SQLite by default and serves a local dashboard. Snowflake users can point it at Cortex for scale.

Sources

  1. TruLens — GitHub — accessed 2026-04-20
  2. TruLens — docs — accessed 2026-04-20