Capability · Comparison

Guidance vs Outlines

When you need an LLM to return output that exactly conforms to a schema — JSON with specific fields, a regex-matching string, a CFG-valid program — you reach for a constrained-generation library. Guidance (Microsoft) and Outlines are the two open-source front-runners. Both work by biasing the model's token probabilities at sampling time. Both integrate with open-weights models; both have evolved to work with API models via structured outputs.

Side-by-side

Criterion	Guidance	Outlines
Maintainer	guidance-ai (originally Microsoft Research)	.txt (dottxt.co)
License	MIT	Apache 2.0
Output constraints supported	Regex, CFG, JSON Schema, Pydantic	Regex, CFG, JSON Schema, Pydantic, function signatures
Templating language	Rich handlebars-style template with inline generation	Pure Python — no template DSL
Model backends	Transformers, llama.cpp, OpenAI compatible	Transformers, vLLM (best), llama.cpp, OpenAI, Anthropic
vLLM integration	Possible	First-class — native
Structured JSON output	Strong	Strong, very ergonomic
Performance overhead	Moderate	Low — compiled FSM for constraint checking
Learning curve	Medium — template DSL is powerful but distinctive	Low — just Python functions

Verdict

Outlines is the smoother default for most Python developers in 2026 — Pythonic API, tight vLLM integration, fast compiled finite-state-machine constraint checking, and first-class support for major open-weights model servers. Guidance is the better pick when you want its expressive template language for complex multi-step programs where generation and logic are interleaved. For API-only models like Claude Opus 4.7 or GPT-5, you usually don't need either — use the vendor's native structured-outputs JSON mode. Both libraries still shine for open-weights deployments.

When to choose each

Choose Guidance if…

You want an expressive templating DSL mixing generation and logic.
Your workflow has complex interleaving of prompts and constraints.
You're building a chat-like experience with fine-grained control.
You prefer Microsoft-origin tooling.

Choose Outlines if…

You serve open-weights models with vLLM and need first-class integration.
You want a clean Pythonic API with minimal DSL overhead.
Low-latency constrained generation matters.
You want the same library to work across many backends (vLLM, Transformers, OpenAI).

Frequently asked questions

Do I need Guidance or Outlines if my model has structured outputs?

If you're on GPT-5, Claude, or Gemini via API, their native structured outputs (JSON schema) are usually enough. You need Guidance/Outlines primarily for open-weights models or when you need regex/grammar constraints beyond JSON schema.

Which is faster at constrained generation?

Outlines' compiled FSM approach is generally faster than Guidance for regex/JSON constraints. For simple JSON schemas the overhead of either is negligible. For tight inner loops with complex constraints, benchmark both on your actual model.

Can I use these with llama.cpp?

Yes, both have llama.cpp bindings. Outlines has a more direct integration via the Python llama-cpp bindings; Guidance works through its llama.cpp backend. For local dev on a Mac, either works well.

Sources

Guidance — GitHub — accessed 2026-04-20
Outlines — Docs — accessed 2026-04-20