Curiosity · AI Model
Stable Audio 2
Stable Audio 2, released by Stability AI in April 2024, extended the original Stable Audio to generate full-length musical compositions up to three minutes, with coherent intro/development/outro structure. It also introduced audio-to-audio conditioning — uploading a reference clip and prompting the model to reinterpret or extend it.
Model specs
- Vendor
- Stability AI
- Family
- Stable Audio
- Released
- 2024-04
- Context window
- 512 tokens
- Modalities
- text, audio
Strengths
- Up to 3-minute coherent compositions in a single generation
- Audio-to-audio conditioning for style transfer
- Part of Stability AI's broader generative-media suite
Limitations
- Closed commercial API alongside research-only weights
- Vocal generation restricted by rights considerations
- Audio quality still below top-tier human-composed music
Use cases
- Music prototyping for creators and game studios
- Sound-effect generation for video and games
- Audio-to-audio remixes and style transfer
- Research on generative audio models
Benchmarks
| Benchmark | Score | As of |
|---|---|---|
| Internal listener preference vs. Stable Audio 1 | preferred in majority of pairings | 2024-04 |
Frequently asked questions
What is Stable Audio 2?
Stable Audio 2 is Stability AI's text-to-audio model, released in April 2024, capable of generating up to three-minute musical compositions with structured intros, development, and outros from a text prompt.
What is audio-to-audio conditioning?
You upload a reference audio clip and prompt Stable Audio 2 to reinterpret, extend, or stylise it — useful for remixes, mood shifts, and continuity across scenes.
Can I use Stable Audio 2 commercially?
Commercial use goes through Stability AI's hosted service with specific licence terms. Open-weights counterparts (Stable Audio Open) cover research and hobbyist use with narrower capabilities.
Sources
- Stability AI — Stable Audio 2 launch — accessed 2026-04-20
- Stability AI — Stable Audio docs — accessed 2026-04-20