Capability · Framework — evals

Patronus AI

Patronus AI (YC S23) is a commercial evaluation platform built around purpose-built judge models. Its flagship Lynx model is trained to detect hallucinations in RAG outputs and consistently outperforms GPT-4-as-a-judge on benchmarks like FaithEval. Patronus also ships FinanceBench and Enterprise-PII evaluators, a Python SDK for scenario testing, and online guardrails you can attach as a proxy layer.

Framework facts

Category
evals
Language
Python / TypeScript (SDK)
License
Proprietary SaaS (Lynx weights Apache-2.0)

Install

pip install patronus

Quickstart

from patronus import Client

client = Client(api_key='PATRONUS_KEY')
result = client.evaluate(
    evaluator='lynx',
    evaluated_model_output='Acme was founded in 1901.',
    evaluated_model_retrieved_context='Acme was founded in 1910.',
    evaluated_model_input='When was Acme founded?',
)
print(result.pass_, result.explanation)

Alternatives

  • Arize Phoenix — OSS
  • DeepEval — OSS Python
  • TruLens — OSS

Frequently asked questions

Is Lynx open-source?

Yes — Patronus released Lynx-70B and Lynx-8B weights under Apache-2.0 on Hugging Face. The Patronus platform adds datasets, orchestration, and dashboards on top.

How is Patronus different from Arize Phoenix?

Phoenix is OSS and observability-first. Patronus is eval-first with managed judge models and curated domain benchmarks — a fit for regulated teams that want vendored evaluators.

Sources

  1. Patronus docs — accessed 2026-04-20
  2. Patronus home — accessed 2026-04-20