Curiosity · AI Model

Llama Guard 3

Llama Guard 3 is Meta's third-generation safety classifier, shipped alongside Llama 3.1 in 2024. It uses an 8-billion-parameter Llama backbone fine-tuned on Meta's hazards taxonomy to label both user inputs and model outputs across 14 risk categories, and runs cheaply enough to sit in-line on every request.

Model specs

Vendor
Meta
Family
Llama Guard
Released
2024-07
Context window
8,192 tokens
Modalities
text

Strengths

  • Open weights — full inspection and customisation
  • Config-driven taxonomy lets teams pick which categories to enforce
  • Easy to deploy on a single GPU alongside generation

Limitations

  • 8B classifier has higher latency than OpenAI's moderation endpoint
  • Custom categories require retraining or few-shot prompting
  • Safety coverage is English-biased despite multilingual training

Use cases

  • Prompt- and response-level moderation for LLM apps
  • Policy-configurable content filtering with open weights
  • Education and research on open moderation systems
  • Bootstrapping custom moderation fine-tunes

Benchmarks

BenchmarkScoreAs of
MLCommons AILuminatecompetitive F1 across 14 categories2024-10

Frequently asked questions

What is Llama Guard 3?

Llama Guard 3 is Meta's open-weights safety classifier, built on Llama 3.1 8B and tuned to label user prompts and model outputs against a 14-category hazards taxonomy.

How do I use Llama Guard 3?

Teams typically run Llama Guard 3 as an in-line moderation step — classifying the user's prompt and the generated response, and blocking or re-asking when categories flagged as unsafe are present.

How does Llama Guard 3 differ from OpenAI's moderation?

Both label content against a safety taxonomy, but Llama Guard 3 is open-weights and configurable, while OpenAI's moderation is a hosted endpoint. Llama Guard 3 gives you more control; OpenAI's is cheaper at scale.

Sources

  1. Hugging Face — meta-llama/Llama-Guard-3-8B — accessed 2026-04-20
  2. Meta — Llama 3.1 launch — accessed 2026-04-20