Capability · Framework — evals
Confident AI
Confident AI is the company that maintains DeepEval, the popular open-source LLM-eval framework. Their hosted platform adds dataset versioning, an eval experiments UI, red-teaming scenarios, prompt management, and production monitoring, all using the same metric definitions as the OSS library — so you can prototype locally and scale without rewriting evaluators.
Framework facts
- Category
- evals
- Language
- Python (SDK)
- License
- Proprietary SaaS (DeepEval OSS Apache-2.0)
Install
pip install deepeval
deepeval login Quickstart
from deepeval import evaluate
from deepeval.metrics import AnswerRelevancyMetric
from deepeval.test_case import LLMTestCase
test = LLMTestCase(input='Capital of France?', actual_output='Paris')
evaluate([test], [AnswerRelevancyMetric(threshold=0.7)])
# Results sync to app.confident-ai.com Alternatives
- Patronus AI — managed judges
- Braintrust — evals + data
- Arize Phoenix — OSS
Frequently asked questions
Is this just hosted DeepEval?
Mostly — the value-add is collaboration (datasets, experiments, team dashboards) and production features (monitoring, alerting, red-team sweeps) on top of the OSS library. You can start free and stay OSS-only if you don't need the collaboration layer.
Does it integrate with CI?
Yes — DeepEval has a pytest plugin that reports runs to Confident AI, making it easy to fail CI on regression.
Sources
- Confident AI docs — accessed 2026-04-20
- DeepEval GitHub — accessed 2026-04-20