Curiosity · AI Model

Emu 2

Emu 2, announced by Meta in late 2023, is a 37-billion-parameter unified multimodal model that can both generate and understand images. It introduced generative multimodal in-context learning — feeding the model example image-text pairs and letting it produce new images that match the demonstrated style or transformation — and seeded later Meta research into video generation.

Model specs

Vendor
Meta
Family
Emu
Released
2023-12
Context window
4,096 tokens
Modalities
text, vision

Strengths

  • Unified architecture handles both understanding and generation
  • Strong in-context learning across modalities
  • Seeded Meta's subsequent Movie Gen research

Limitations

  • Research release — no public production API
  • Surpassed by dedicated image-generation models on photorealism
  • Limited context window by today's standards

Use cases

  • Research on generative multimodal in-context learning
  • Image editing and style transfer with few-shot prompts
  • Multimodal reasoning with mixed image+text prompts
  • Baseline for later Emu Video and Movie Gen work

Benchmarks

BenchmarkScoreAs of
Image-generation CLIP scorecompetitive with contemporary diffusion models2023-12

Frequently asked questions

What is Emu 2?

Emu 2 is Meta's 37-billion-parameter unified multimodal model, released in December 2023, capable of generating images and reasoning over mixed image and text inputs.

Can I use Emu 2 in production?

Emu 2 is a research release; Meta has not shipped it as a public API. Its successors underpin product features like image generation in Meta AI.

What makes Emu 2 novel?

It unified image generation and multimodal understanding in a single autoregressive model, enabling 'show-me-examples' image generation via in-context learning.

Sources

  1. arXiv — Emu 2 paper — accessed 2026-04-20
  2. Meta AI Research — Emu 2 — accessed 2026-04-20