Stability AI and Arm Bring Offline Generative Audio to Smartphones

Stability AI, known for its Stable Diffusion text-to-image models, has collaborated with global semiconductor giant Arm to add generative audio AI capabilities to mobile devices.

With this partnership, it has managed to run Stable Audio Open, its text-to-audio model, entirely on Arm CPUs. This involves generating sound effects, audio samples, and production elements in seconds, all on-device, and without needing an internet connection.

Stability AI stated, “As generative AI becomes increasingly integral to both enterprises and professional creators alike, it’s crucial that our models and workflows are easily accessible everywhere builders build and creators create, providing seamless integration into their visual media production pipelines.”

To address the increasing demand, the company aimed to run its models efficiently at the edge. It was a challenge to optimise the Stable Audio Open model for mobile devices. It was tested on a device with an Arm CPU, initially taking 240 seconds.

With distillation of the model and using Arm’s software stack, like the int8 matmul kernels from KleidiAI in ExecuTorch via XNNPack, it was able to reduce the generation time for an 11-second clip to under 8 seconds on Armv9 CPUs. This resulted in a 30x faster response time.

One would require a compatible mobile device to try the capability. Considering that most smartphones today feature Arm-based CPUs, it should be accessible to all kinds of users. Stability AI also plans to bring all its models across image, video, and 3D to the edge, aiming to transform how visual media is created on mobile devices.

The post Stability AI and Arm Bring Offline Generative Audio to Smartphones appeared first on Analytics India Magazine.

Scroll to Top