Capability · Framework — evals
TruLens
TruLens instruments your LLM app, records every call as a trace, and scores each trace with pluggable feedback functions (groundedness, answer relevance, context relevance, toxicity, custom). The local Streamlit-style dashboard makes it easy to diff experiments and regression-check a RAG pipeline.
Framework facts
- Category
- evals
- Language
- Python
- License
- MIT
- Repository
- https://github.com/truera/trulens
Install
pip install trulens Quickstart
from trulens.core import TruSession
from trulens.apps.custom import TruCustomApp
from trulens.providers.openai import OpenAI as TruOpenAI
from trulens.core.feedback import Feedback
provider = TruOpenAI()
relevance = Feedback(provider.relevance).on_input_output()
session = TruSession()
tru_app = TruCustomApp(my_app, app_id='demo', feedbacks=[relevance])
with tru_app as rec:
my_app.query('What is VSET?')
session.get_leaderboard() Alternatives
- Ragas — focused on RAG scoring
- Promptfoo — YAML eval sweeps
- DeepEval — pytest-style assertions
- Arize Phoenix — open observability + evals
Frequently asked questions
What is the RAG Triad?
A popular TruLens eval: (1) context relevance of retrieved chunks, (2) groundedness of the answer in those chunks, (3) relevance of the answer to the question. Scoring all three catches most RAG regressions.
Does TruLens need a backend?
No. It writes traces to SQLite by default and serves a local dashboard. Snowflake users can point it at Cortex for scale.
Sources
- TruLens — GitHub — accessed 2026-04-20
- TruLens — docs — accessed 2026-04-20