Curiosity · AI Model

Stable Video Diffusion

Stable Video Diffusion (SVD), introduced by Stability AI in November 2023, is an open-weights image-to-video latent diffusion model. It starts from a still image and produces a short clip of 14 or 25 frames with controllable motion intensity. SVD kicked off the open-weights video-generation wave and underpins many of today's derivative community video tools.

Model specs

Vendor
Stability AI
Family
Stable Video
Released
2023-11
Context window
1 tokens
Modalities
vision, video

Strengths

  • Open weights under Stability AI research / commercial license
  • Works well with standard Stable Diffusion tooling pipelines
  • Community extensions for motion control and longer clips

Limitations

  • Short clip length (14 or 25 frames) without community extensions
  • Quality below commercial generators like Sora or Veo
  • No audio — video only

Use cases

  • Open-weights research on video diffusion
  • Short animation prototyping from reference images
  • Motion-controlled cinemagraphs and product shots
  • Teaching video-diffusion architectures in class

Benchmarks

BenchmarkScoreAs of
Human preference vs. AnimateDiff (internal)preferred in majority of pairings2023-11

Frequently asked questions

What is Stable Video Diffusion?

Stable Video Diffusion is Stability AI's open-weights image-to-video latent diffusion model, generating short 14- or 25-frame clips from a single reference image using a Stable Diffusion–based architecture.

How long are SVD clips?

Official checkpoints generate 14 or 25 frames at up to 1024x576 resolution. Community extensions stretch this further with interpolation and long-video pipelines.

What license covers SVD?

Stability AI ships SVD under its own research/commercial license — commercial use requires a paid Stability membership above certain revenue thresholds.

Sources

  1. Stability AI — Stable Video Diffusion — accessed 2026-04-20
  2. Hugging Face — stabilityai/stable-video-diffusion-img2vid — accessed 2026-04-20