Stability AI on May 20 released Stable Audio 3.0, a four-model family that generates music and sound effects up to more than six minutes long, with three of the four variants shipping as open weights under a commercially permissive license.
The Stable Audio franchise had been quiet since Stable Audio 2.0 in early 2024. That release capped tracks at roughly 90 seconds and required a proprietary license. During the 18 months since, Suno, Udio, and ElevenLabs built music-generation products that attracted large consumer audiences while the Stable Audio line produced nothing developers could freely embed or fine-tune. Version 3.0 is the first release in this product line that changes both constraints at once.
The family includes four models. 3.0 Small SFX targets short sound effects. 3.0 Small generates up to two minutes of music and, according to the announcement on Stability AI’s website, is the only model capable of full music composition running on-device and offline. 3.0 Medium and 3.0 Large both generate more than six minutes. Small, Small SFX, and Medium are available as open weights on Hugging Face. Large is available only through the Stability AI API.
The architectural change enabling longer output is a novel semantic-acoustic autoencoder that supports variable-length generation at per-second granularity. Prior Stable Audio models used fixed output lengths. The new architecture also supports audio inpainting: single-segment editing, multi-segment editing, and causal continuation, which extends a track beyond its original endpoint without restarting generation. LoRA fine-tuning documentation is published alongside the weights for Small and Medium, which means developers can adapt the models to custom audio libraries without a full retraining run.
The licensing position is the sharpest differentiator relative to Suno and Udio. Stability AI states all models were trained on fully licensed data, and that output ownership passes to the creator under the community license. Organizations generating more than one million dollars in annual revenue require an enterprise license, which also includes legal indemnification. Stability AI notes, accurately, that most competing open music models either restrict commercial use or were trained on unlicensed content.
The competitive context matters here. Suno and Udio both face ongoing copyright litigation in the United States, filed in 2024, over their training data. ElevenLabs has moved into music with products that carry similar ambiguity. A commercially licensed, open-weight model at six-minute length is a direct answer to the one structural gap that has kept frontier audio generation out of production developer stacks.
Stability AI also disclosed partnerships with Universal Music Group and Warner Music Group, though the announcement does not specify what those agreements cover: distribution, training data licensing, or both. The distinction matters for developers assessing downstream legal exposure.
ComfyUI integration is listed as forthcoming. The weights and API are available now.
Developers building audio products who have avoided music generation because of licensing uncertainty now have a credible open-weight alternative. Teams currently evaluating Suno’s API for ambient music or background scoring should benchmark Stable Audio 3.0 Medium against their existing pipeline before the end of this quarter, particularly if their output volume will eventually cross the enterprise revenue threshold.
Reported by Stability AI on 2026-05-20.