Curiosity · AI Model
Emu 2
Emu 2, announced by Meta in late 2023, is a 37-billion-parameter unified multimodal model that can both generate and understand images. It introduced generative multimodal in-context learning — feeding the model example image-text pairs and letting it produce new images that match the demonstrated style or transformation — and seeded later Meta research into video generation.
Model specs
- Vendor
- Meta
- Family
- Emu
- Released
- 2023-12
- Context window
- 4,096 tokens
- Modalities
- text, vision
Strengths
- Unified architecture handles both understanding and generation
- Strong in-context learning across modalities
- Seeded Meta's subsequent Movie Gen research
Limitations
- Research release — no public production API
- Surpassed by dedicated image-generation models on photorealism
- Limited context window by today's standards
Use cases
- Research on generative multimodal in-context learning
- Image editing and style transfer with few-shot prompts
- Multimodal reasoning with mixed image+text prompts
- Baseline for later Emu Video and Movie Gen work
Benchmarks
| Benchmark | Score | As of |
|---|---|---|
| Image-generation CLIP score | competitive with contemporary diffusion models | 2023-12 |
Frequently asked questions
What is Emu 2?
Emu 2 is Meta's 37-billion-parameter unified multimodal model, released in December 2023, capable of generating images and reasoning over mixed image and text inputs.
Can I use Emu 2 in production?
Emu 2 is a research release; Meta has not shipped it as a public API. Its successors underpin product features like image generation in Meta AI.
What makes Emu 2 novel?
It unified image generation and multimodal understanding in a single autoregressive model, enabling 'show-me-examples' image generation via in-context learning.
Sources
- arXiv — Emu 2 paper — accessed 2026-04-20
- Meta AI Research — Emu 2 — accessed 2026-04-20