Capability · Framework — evals

Confident AI

Confident AI is the company that maintains DeepEval, the popular open-source LLM-eval framework. Their hosted platform adds dataset versioning, an eval experiments UI, red-teaming scenarios, prompt management, and production monitoring, all using the same metric definitions as the OSS library — so you can prototype locally and scale without rewriting evaluators.

Framework facts

Category
evals
Language
Python (SDK)
License
Proprietary SaaS (DeepEval OSS Apache-2.0)

Install

pip install deepeval
deepeval login

Quickstart

from deepeval import evaluate
from deepeval.metrics import AnswerRelevancyMetric
from deepeval.test_case import LLMTestCase

test = LLMTestCase(input='Capital of France?', actual_output='Paris')
evaluate([test], [AnswerRelevancyMetric(threshold=0.7)])
# Results sync to app.confident-ai.com

Alternatives

  • Patronus AI — managed judges
  • Braintrust — evals + data
  • Arize Phoenix — OSS

Frequently asked questions

Is this just hosted DeepEval?

Mostly — the value-add is collaboration (datasets, experiments, team dashboards) and production features (monitoring, alerting, red-team sweeps) on top of the OSS library. You can start free and stay OSS-only if you don't need the collaboration layer.

Does it integrate with CI?

Yes — DeepEval has a pytest plugin that reports runs to Confident AI, making it easy to fail CI on regression.

Sources

  1. Confident AI docs — accessed 2026-04-20
  2. DeepEval GitHub — accessed 2026-04-20