Introducing Stable Audio

Built by Stability AI’s in-house Harmonai audio lab, Stable Audio was trained on a dataset of 800,000 audio clips totaling 19,500 hours licensed from audio partner AudioSparx.

Like Stable Diffusion, Stable Audio generates audio from natural language prompts specifying genre, tempo, instruments, moods, and other attributes. For example, a user could input “Disco, synthesizer, drums, 120 BPM, orchestral, piano, guitar” to get a matching audio clip.

In our early audio tests, Stable Audio shows significant quality improvements over previous AI music generators, with less noise and compression artifacts. However, the instrumentation still sounds more haphazard compared to human-composed music.

Commercial Release

Stability AI has adopted a similar subscription model to Midjourney for Stable Audio. The free tier permits generating 20 audio clips per month (45 sec each), while the $11.99 tier allows 500 clips up to 90 sec that can be used commercially.

Surprisingly, Stability AI has not open sourced the model, despite its open source ethos. But the company promises that Harmonai will release another audio model trained on different data in the future, sharing the Stable Audio code to allow custom training.

Stability AI also notes that its training methodology avoids choppy audio output by incorporating metadata on clip duration and start times, enabling uninterrupted generation of arbitrary length.

Applications in Gaming

AI-generated music has potential for gaming, as cinematic soundtracks become more crucial. However, most game studios still underinvest in audio. Compared to CGI art teams, audio departments remain small, limiting the financial incentive for AI tools.

For now, AIGC must also compete against mature commercial audio libraries and economical outsourcing. But with continuous progress, AI could someday produce AAA-quality soundtracks at scale, making it indispensable for game creators.

Stable Audio demonstrates Stability AI’s growing capabilities in synthesized audio. As models improve, AI music generation may significantly impact many media and entertainment sectors.