Capability · Framework — evals
Patronus AI
Patronus AI (YC S23) is a commercial evaluation platform built around purpose-built judge models. Its flagship Lynx model is trained to detect hallucinations in RAG outputs and consistently outperforms GPT-4-as-a-judge on benchmarks like FaithEval. Patronus also ships FinanceBench and Enterprise-PII evaluators, a Python SDK for scenario testing, and online guardrails you can attach as a proxy layer.
Framework facts
- Category
- evals
- Language
- Python / TypeScript (SDK)
- License
- Proprietary SaaS (Lynx weights Apache-2.0)
Install
pip install patronus Quickstart
from patronus import Client
client = Client(api_key='PATRONUS_KEY')
result = client.evaluate(
evaluator='lynx',
evaluated_model_output='Acme was founded in 1901.',
evaluated_model_retrieved_context='Acme was founded in 1910.',
evaluated_model_input='When was Acme founded?',
)
print(result.pass_, result.explanation) Alternatives
- Arize Phoenix — OSS
- DeepEval — OSS Python
- TruLens — OSS
Frequently asked questions
Is Lynx open-source?
Yes — Patronus released Lynx-70B and Lynx-8B weights under Apache-2.0 on Hugging Face. The Patronus platform adds datasets, orchestration, and dashboards on top.
How is Patronus different from Arize Phoenix?
Phoenix is OSS and observability-first. Patronus is eval-first with managed judge models and curated domain benchmarks — a fit for regulated teams that want vendored evaluators.
Sources
- Patronus docs — accessed 2026-04-20
- Patronus home — accessed 2026-04-20