Curiosity · AI Model

Stable Audio 2

Stable Audio 2, released by Stability AI in April 2024, extended the original Stable Audio to generate full-length musical compositions up to three minutes, with coherent intro/development/outro structure. It also introduced audio-to-audio conditioning — uploading a reference clip and prompting the model to reinterpret or extend it.

Model specs

Vendor: Stability AI
Family: Stable Audio
Released: 2024-04
Context window: 512 tokens
Modalities: text, audio

Strengths

Up to 3-minute coherent compositions in a single generation
Audio-to-audio conditioning for style transfer
Part of Stability AI's broader generative-media suite

Limitations

Closed commercial API alongside research-only weights
Vocal generation restricted by rights considerations
Audio quality still below top-tier human-composed music

Use cases

Music prototyping for creators and game studios
Sound-effect generation for video and games
Audio-to-audio remixes and style transfer
Research on generative audio models

Benchmarks

Benchmark	Score	As of
Internal listener preference vs. Stable Audio 1	preferred in majority of pairings	2024-04

Frequently asked questions

What is Stable Audio 2?

Stable Audio 2 is Stability AI's text-to-audio model, released in April 2024, capable of generating up to three-minute musical compositions with structured intros, development, and outros from a text prompt.

What is audio-to-audio conditioning?

You upload a reference audio clip and prompt Stable Audio 2 to reinterpret, extend, or stylise it — useful for remixes, mood shifts, and continuity across scenes.

Can I use Stable Audio 2 commercially?

Commercial use goes through Stability AI's hosted service with specific licence terms. Open-weights counterparts (Stable Audio Open) cover research and hobbyist use with narrower capabilities.

Sources

Stability AI — Stable Audio 2 launch — accessed 2026-04-20
Stability AI — Stable Audio docs — accessed 2026-04-20