Curiosity · AI Model
DALL·E 2
DALL·E 2 was OpenAI's April 2022 breakthrough text-to-image system and the model that pushed diffusion-based image generation into the mainstream. It pairs a CLIP-based text/image embedding space with an unCLIP prior that maps text embeddings to image embeddings, then decodes them with a 64×64 diffusion model upsampled to 1024×1024. Variations, inpainting, and outpainting features made it the dominant creative-AI API of 2022-23. Superseded by DALL·E 3 (2023) and the GPT-Image family; legacy API endpoint still available.
Model specs
- Vendor
- OpenAI
- Family
- DALL·E
- Released
- 2022-04
- Context window
- 1 tokens
- Modalities
- text, vision
- Input price
- n/a
- Output price
- n/a
- Pricing as of
- 2026-04-20
Strengths
- First consumer diffusion model at scale
- Cheap per-image price relative to contemporaries
- Supports variations, inpainting, outpainting
- Stable, well-documented API
Limitations
- Quality well behind DALL·E 3, Midjourney v6, Imagen 3, SDXL
- Weak on text-in-image rendering
- 1024×1024 max resolution
- Style range narrower than modern open models
Use cases
- Legacy applications still calling the DALL·E 2 API
- Educational reference for diffusion architecture (unCLIP)
- Low-cost image variations and inpainting
- Benchmarking against modern generators
Benchmarks
| Benchmark | Score | As of |
|---|---|---|
| FID (MS-COCO zero-shot) | ≈10.4 | 2022-04 |
| Human preference vs DALL·E 1 | preferred ~72% of the time | 2022-04 |
Frequently asked questions
What is DALL·E 2?
DALL·E 2 is OpenAI's 2022 text-to-image diffusion model that uses CLIP-guided priors and cascaded upsamplers to produce 1024×1024 images from text prompts.
Is DALL·E 2 still recommended?
For new work, no — DALL·E 3 and OpenAI's GPT-Image API produce much better results. DALL·E 2 is mainly a legacy option.
Can DALL·E 2 edit existing images?
Yes. It supports variations, inpainting (masked edits), and outpainting (extending beyond the canvas).
Sources
- OpenAI — DALL·E 2 — accessed 2026-04-20
- Hierarchical Text-Conditional Image Generation with CLIP Latents — accessed 2026-04-20