Curiosity · AI Model

Llama Guard 3

Llama Guard 3 is Meta's third-generation safety classifier, shipped alongside Llama 3.1 in 2024. It uses an 8-billion-parameter Llama backbone fine-tuned on Meta's hazards taxonomy to label both user inputs and model outputs across 14 risk categories, and runs cheaply enough to sit in-line on every request.

Model specs

Vendor: Meta
Family: Llama Guard
Released: 2024-07
Context window: 8,192 tokens
Modalities: text

Strengths

Open weights — full inspection and customisation
Config-driven taxonomy lets teams pick which categories to enforce
Easy to deploy on a single GPU alongside generation

Limitations

8B classifier has higher latency than OpenAI's moderation endpoint
Custom categories require retraining or few-shot prompting
Safety coverage is English-biased despite multilingual training

Use cases

Prompt- and response-level moderation for LLM apps
Policy-configurable content filtering with open weights
Education and research on open moderation systems
Bootstrapping custom moderation fine-tunes

Benchmarks

Benchmark	Score	As of
MLCommons AILuminate	competitive F1 across 14 categories	2024-10

Frequently asked questions

What is Llama Guard 3?

Llama Guard 3 is Meta's open-weights safety classifier, built on Llama 3.1 8B and tuned to label user prompts and model outputs against a 14-category hazards taxonomy.

How do I use Llama Guard 3?

Teams typically run Llama Guard 3 as an in-line moderation step — classifying the user's prompt and the generated response, and blocking or re-asking when categories flagged as unsafe are present.

How does Llama Guard 3 differ from OpenAI's moderation?

Both label content against a safety taxonomy, but Llama Guard 3 is open-weights and configurable, while OpenAI's moderation is a hosted endpoint. Llama Guard 3 gives you more control; OpenAI's is cheaper at scale.

Sources

Hugging Face — meta-llama/Llama-Guard-3-8B — accessed 2026-04-20
Meta — Llama 3.1 launch — accessed 2026-04-20