Curiosity · AI Model
Stable Video Diffusion
Stable Video Diffusion (SVD), introduced by Stability AI in November 2023, is an open-weights image-to-video latent diffusion model. It starts from a still image and produces a short clip of 14 or 25 frames with controllable motion intensity. SVD kicked off the open-weights video-generation wave and underpins many of today's derivative community video tools.
Model specs
- Vendor
- Stability AI
- Family
- Stable Video
- Released
- 2023-11
- Context window
- 1 tokens
- Modalities
- vision, video
Strengths
- Open weights under Stability AI research / commercial license
- Works well with standard Stable Diffusion tooling pipelines
- Community extensions for motion control and longer clips
Limitations
- Short clip length (14 or 25 frames) without community extensions
- Quality below commercial generators like Sora or Veo
- No audio — video only
Use cases
- Open-weights research on video diffusion
- Short animation prototyping from reference images
- Motion-controlled cinemagraphs and product shots
- Teaching video-diffusion architectures in class
Benchmarks
| Benchmark | Score | As of |
|---|---|---|
| Human preference vs. AnimateDiff (internal) | preferred in majority of pairings | 2023-11 |
Frequently asked questions
What is Stable Video Diffusion?
Stable Video Diffusion is Stability AI's open-weights image-to-video latent diffusion model, generating short 14- or 25-frame clips from a single reference image using a Stable Diffusion–based architecture.
How long are SVD clips?
Official checkpoints generate 14 or 25 frames at up to 1024x576 resolution. Community extensions stretch this further with interpolation and long-video pipelines.
What license covers SVD?
Stability AI ships SVD under its own research/commercial license — commercial use requires a paid Stability membership above certain revenue thresholds.
Sources
- Stability AI — Stable Video Diffusion — accessed 2026-04-20
- Hugging Face — stabilityai/stable-video-diffusion-img2vid — accessed 2026-04-20