Capability · Framework — evals
Promptfoo
Promptfoo turns prompt evaluation into something closer to a standard test suite. You describe providers, test cases, and assertions in YAML, run `promptfoo eval`, and get comparison grids across OpenAI, Anthropic, Gemini, and local models. It also ships a red-teaming module for jailbreak and safety testing.
Framework facts
- Category
- evals
- Language
- TypeScript
- License
- MIT
- Repository
- https://github.com/promptfoo/promptfoo
Install
npx promptfoo@latest init
# or global
npm install -g promptfoo Quickstart
# promptfooconfig.yaml
providers:
- anthropic:messages:claude-opus-4-7
- openai:chat:gpt-4o
prompts:
- 'Summarise {{text}} in one sentence.'
tests:
- vars:
text: 'VSET is a Delhi-based engineering school.'
assert:
- type: contains
value: 'Delhi'
# run
# npx promptfoo eval && npx promptfoo view Alternatives
- DeepEval — Python-first assertions
- Ragas — RAG-specific metrics
- Inspect AI — UK AISI safety evals
- OpenAI Evals — first-party registry
Frequently asked questions
What's Promptfoo's superpower?
Matrix sweeps. You can compare dozens of prompts across multiple providers side-by-side with one command, which makes picking a prompt or model straightforward.
Can Promptfoo evaluate RAG pipelines?
Yes — wrap your pipeline as a custom provider. It supports LLM-as-judge, keyword, regex, and custom JavaScript/Python assertions.
Sources
- Promptfoo — GitHub — accessed 2026-04-20
- Promptfoo — docs — accessed 2026-04-20