Note by @fastfinge

My current project of ultimate silliness is using omnivoice, gemma-4, bark, and ace-step to create a radio station that's entirely AI generated but runs locally without using the cloud. It's super buggy, so not sharing yet. But it can do foreground music, background music, foreground sound effects, background sound effects, and host dialogue with multiple hosts, all positioned with HRTF inside a studio. The hosts can use a browser to look stuff up, move themselves around the studio, and talk to each other. Sound effects and music are cached and reused. No, I don't expect to replace radio. It's more of an art project/way to torture people I don't like with a stream of endless audio slop. Also proof of what can be done without a data center; a modern video card is enough to generate spoken dialogue, music, and sound effects all in close to real time. If you have 24 gb of VRAM you don't need an enormous data center to do everything you could possibly want.

The primary issue is that the longer it runs, the farther and farther the station deviates from the original prompt. It started out as a 24/7 news station. Within 20 minutes it generated and played a song with the lyrics "I go into the kitchen and what do I see? Round and happy, just like me. A potato! Yay!" Followed by one of the hosts saying "Oh my God why do I do this job. Please send help." Note that this is caused by bad sampler settings and poor prompting; a giant trillion parameter model wouldn't do any better than what I can run locally.

If I ever get this thing in releasable shape it'll serve as a kind of ultimate answer to the people who think AI needs nine million data centers. No, it really doesn't. One gaming computer is fine. The purpose of the data centers is to centralize control in the hands of the corporations, not because AI actually needs them.