Curiosity · AI Model

Emu 2

Emu 2, announced by Meta in late 2023, is a 37-billion-parameter unified multimodal model that can both generate and understand images. It introduced generative multimodal in-context learning — feeding the model example image-text pairs and letting it produce new images that match the demonstrated style or transformation — and seeded later Meta research into video generation.

Model specs

Vendor: Meta
Family: Emu
Released: 2023-12
Context window: 4,096 tokens
Modalities: text, vision

Strengths

Unified architecture handles both understanding and generation
Strong in-context learning across modalities
Seeded Meta's subsequent Movie Gen research

Limitations

Research release — no public production API
Surpassed by dedicated image-generation models on photorealism
Limited context window by today's standards

Use cases

Research on generative multimodal in-context learning
Image editing and style transfer with few-shot prompts
Multimodal reasoning with mixed image+text prompts
Baseline for later Emu Video and Movie Gen work

Benchmarks

Benchmark	Score	As of
Image-generation CLIP score	competitive with contemporary diffusion models	2023-12

Frequently asked questions

What is Emu 2?

Emu 2 is Meta's 37-billion-parameter unified multimodal model, released in December 2023, capable of generating images and reasoning over mixed image and text inputs.

Can I use Emu 2 in production?

Emu 2 is a research release; Meta has not shipped it as a public API. Its successors underpin product features like image generation in Meta AI.

What makes Emu 2 novel?

It unified image generation and multimodal understanding in a single autoregressive model, enabling 'show-me-examples' image generation via in-context learning.

Sources

arXiv — Emu 2 paper — accessed 2026-04-20
Meta AI Research — Emu 2 — accessed 2026-04-20