Curiosity · AI Model

Stable Audio 2

Stable Audio 2, released by Stability AI in April 2024, extended the original Stable Audio to generate full-length musical compositions up to three minutes, with coherent intro/development/outro structure. It also introduced audio-to-audio conditioning — uploading a reference clip and prompting the model to reinterpret or extend it.

Model specs

Vendor
Stability AI
Family
Stable Audio
Released
2024-04
Context window
512 tokens
Modalities
text, audio

Strengths

  • Up to 3-minute coherent compositions in a single generation
  • Audio-to-audio conditioning for style transfer
  • Part of Stability AI's broader generative-media suite

Limitations

  • Closed commercial API alongside research-only weights
  • Vocal generation restricted by rights considerations
  • Audio quality still below top-tier human-composed music

Use cases

  • Music prototyping for creators and game studios
  • Sound-effect generation for video and games
  • Audio-to-audio remixes and style transfer
  • Research on generative audio models

Benchmarks

BenchmarkScoreAs of
Internal listener preference vs. Stable Audio 1preferred in majority of pairings2024-04

Frequently asked questions

What is Stable Audio 2?

Stable Audio 2 is Stability AI's text-to-audio model, released in April 2024, capable of generating up to three-minute musical compositions with structured intros, development, and outros from a text prompt.

What is audio-to-audio conditioning?

You upload a reference audio clip and prompt Stable Audio 2 to reinterpret, extend, or stylise it — useful for remixes, mood shifts, and continuity across scenes.

Can I use Stable Audio 2 commercially?

Commercial use goes through Stability AI's hosted service with specific licence terms. Open-weights counterparts (Stable Audio Open) cover research and hobbyist use with narrower capabilities.

Sources

  1. Stability AI — Stable Audio 2 launch — accessed 2026-04-20
  2. Stability AI — Stable Audio docs — accessed 2026-04-20