Curiosity · AI Model
Llama Guard 3
Llama Guard 3 is Meta's third-generation safety classifier, shipped alongside Llama 3.1 in 2024. It uses an 8-billion-parameter Llama backbone fine-tuned on Meta's hazards taxonomy to label both user inputs and model outputs across 14 risk categories, and runs cheaply enough to sit in-line on every request.
Model specs
- Vendor
- Meta
- Family
- Llama Guard
- Released
- 2024-07
- Context window
- 8,192 tokens
- Modalities
- text
Strengths
- Open weights — full inspection and customisation
- Config-driven taxonomy lets teams pick which categories to enforce
- Easy to deploy on a single GPU alongside generation
Limitations
- 8B classifier has higher latency than OpenAI's moderation endpoint
- Custom categories require retraining or few-shot prompting
- Safety coverage is English-biased despite multilingual training
Use cases
- Prompt- and response-level moderation for LLM apps
- Policy-configurable content filtering with open weights
- Education and research on open moderation systems
- Bootstrapping custom moderation fine-tunes
Benchmarks
| Benchmark | Score | As of |
|---|---|---|
| MLCommons AILuminate | competitive F1 across 14 categories | 2024-10 |
Frequently asked questions
What is Llama Guard 3?
Llama Guard 3 is Meta's open-weights safety classifier, built on Llama 3.1 8B and tuned to label user prompts and model outputs against a 14-category hazards taxonomy.
How do I use Llama Guard 3?
Teams typically run Llama Guard 3 as an in-line moderation step — classifying the user's prompt and the generated response, and blocking or re-asking when categories flagged as unsafe are present.
How does Llama Guard 3 differ from OpenAI's moderation?
Both label content against a safety taxonomy, but Llama Guard 3 is open-weights and configurable, while OpenAI's moderation is a hosted endpoint. Llama Guard 3 gives you more control; OpenAI's is cheaper at scale.
Sources
- Hugging Face — meta-llama/Llama-Guard-3-8B — accessed 2026-04-20
- Meta — Llama 3.1 launch — accessed 2026-04-20