Curiosity · AI Model

Stable Video Diffusion

Stable Video Diffusion (SVD), introduced by Stability AI in November 2023, is an open-weights image-to-video latent diffusion model. It starts from a still image and produces a short clip of 14 or 25 frames with controllable motion intensity. SVD kicked off the open-weights video-generation wave and underpins many of today's derivative community video tools.

Model specs

Vendor: Stability AI
Family: Stable Video
Released: 2023-11
Context window: 1 tokens
Modalities: vision, video

Strengths

Open weights under Stability AI research / commercial license
Works well with standard Stable Diffusion tooling pipelines
Community extensions for motion control and longer clips

Limitations

Short clip length (14 or 25 frames) without community extensions
Quality below commercial generators like Sora or Veo
No audio — video only

Use cases

Open-weights research on video diffusion
Short animation prototyping from reference images
Motion-controlled cinemagraphs and product shots
Teaching video-diffusion architectures in class

Benchmarks

Benchmark	Score	As of
Human preference vs. AnimateDiff (internal)	preferred in majority of pairings	2023-11

Frequently asked questions

What is Stable Video Diffusion?

Stable Video Diffusion is Stability AI's open-weights image-to-video latent diffusion model, generating short 14- or 25-frame clips from a single reference image using a Stable Diffusion–based architecture.

How long are SVD clips?

Official checkpoints generate 14 or 25 frames at up to 1024x576 resolution. Community extensions stretch this further with interpolation and long-video pipelines.

What license covers SVD?

Stability AI ships SVD under its own research/commercial license — commercial use requires a paid Stability membership above certain revenue thresholds.

Sources

Stability AI — Stable Video Diffusion — accessed 2026-04-20
Hugging Face — stabilityai/stable-video-diffusion-img2vid — accessed 2026-04-20