Capability · Framework — evals

Promptfoo

Promptfoo turns prompt evaluation into something closer to a standard test suite. You describe providers, test cases, and assertions in YAML, run `promptfoo eval`, and get comparison grids across OpenAI, Anthropic, Gemini, and local models. It also ships a red-teaming module for jailbreak and safety testing.

Framework facts

Category
evals
Language
TypeScript
License
MIT
Repository
https://github.com/promptfoo/promptfoo

Install

npx promptfoo@latest init
# or global
npm install -g promptfoo

Quickstart

# promptfooconfig.yaml
providers:
  - anthropic:messages:claude-opus-4-7
  - openai:chat:gpt-4o
prompts:
  - 'Summarise {{text}} in one sentence.'
tests:
  - vars:
      text: 'VSET is a Delhi-based engineering school.'
    assert:
      - type: contains
        value: 'Delhi'
# run
# npx promptfoo eval && npx promptfoo view

Alternatives

  • DeepEval — Python-first assertions
  • Ragas — RAG-specific metrics
  • Inspect AI — UK AISI safety evals
  • OpenAI Evals — first-party registry

Frequently asked questions

What's Promptfoo's superpower?

Matrix sweeps. You can compare dozens of prompts across multiple providers side-by-side with one command, which makes picking a prompt or model straightforward.

Can Promptfoo evaluate RAG pipelines?

Yes — wrap your pipeline as a custom provider. It supports LLM-as-judge, keyword, regex, and custom JavaScript/Python assertions.

Sources

  1. Promptfoo — GitHub — accessed 2026-04-20
  2. Promptfoo — docs — accessed 2026-04-20