User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
2mo
My current project of ultimate silliness is using omnivoice, gemma-4, bark, and ace-step to create a radio station that's entirely AI generated but runs locally without using the cloud. It's super buggy, so not sharing yet. But it can do foreground music, background music, foreground sound effects, background sound effects, and host dialogue with multiple hosts, all positioned with HRTF inside a studio. The hosts can use a browser to look stuff up, move themselves around the studio, and talk to each other. Sound effects and music are cached and reused. No, I don't expect to replace radio. It's more of an art project/way to torture people I don't like with a stream of endless audio slop. Also proof of what can be done without a data center; a modern video card is enough to generate spoken dialogue, music, and sound effects all in close to real time. If you have 24 gb of VRAM you don't need an enormous data center to do everything you could possibly want.

The primary issue is that the longer it runs, the farther and farther the station deviates from the original prompt. It started out as a 24/7 news station. Within 20 minutes it generated and played a song with the lyrics "I go into the kitchen and what do I see? Round and happy, just like me. A potato! Yay!" Followed by one of the hosts saying "Oh my God why do I do this job. Please send help." Note that this is caused by bad sampler settings and poor prompting; a giant trillion parameter model wouldn't do any better than what I can run locally.

If I ever get this thing in releasable shape it'll serve as a kind of ultimate answer to the people who think AI needs nine million data centers. No, it really doesn't. One gaming computer is fine. The purpose of the data centers is to centralize control in the hands of the corporations, not because AI actually needs them.
8
13
8
0

User avatar
Stevo @stevo399@dragonscave.space
2mo
@fastfinge Oh what the hell? That's super cool! I want that!
0
0
0
0
User avatar
Martin @mcourcel@allovertheplace.ca
2mo
@fastfinge Now, if you can get the AI to report on the news on an hourly basis. Taht would be cool.
0
0
1
0
User avatar
Spacedog @spacepup@mastodon.stickbear.me
2mo
@jaybird110127 @fastfinge wishi could use ace step but my gpu only has 8gb vram
0
0
1
0
User avatar
Steve @sclower@stranger.social
2mo
@BorrisInABox @fastfinge I'd love to hear a sample of this anyway. The premise is hillarious.
2
0
0
0
User avatar
Martin @mcourcel@allovertheplace.ca
2mo
0
0
0
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
2mo
@sclower @BorrisInABox Right now I have bugs with the audio queues falling out of sync. I have queues for foreground audio and music, background music, and background sound effects. In order to not have audio under runs I then spawn different threads to generate stuff, and have a manager to control swapping models in and out based on the size of each queue. Then each queue streams audio to a mixer function that does spatialization and so on. Unfortunately when clips end up being different lengths than expected the entire thing gets out of sync. Once I figure out a solution I’ll launch a live demo stream and release source code.
1
0
0
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
2mo
@sclower @BorrisInABox Thanks to the nature of AI, sometimes asking for a 30 second bed returns a 3 second bed. Or a 300 second one. And then everything is a mess because the clip wasn’t the expected length, and worse generation didn’t take the expected time so now other queues might under run.
0
0
0
0
2mo
@fastfinge hi, uh, please send me a copy of this. I don't have that much VRAM, so I'm guessing it would have to generate things in non-realtime?
1
0
0
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
2mo
@freya Nope. It just wouldn’t run at all because the models won’t load. Acestep is the worst, with gemma4 a close second. Omnivoice and bark can get buy with 12 gb of vram.
0
0
0
0
User avatar
Patrick Perdue @BorrisInABox@fwoof.space
2mo
@fastfinge I wonder if that would run on a MacBook Pro M4 Max with 36GB of unified RAM. More potatoes?
0
0
0
0
User avatar
Alan @Alan@dragonscave.space
2mo
@fastfinge is there a way you can show an example?
0
0
0
0
User avatar
Majid Hussain @mhussain@universeodon.com
2mo
@fastfinge I love the sound of that,
in my younger years, used to love radio dxing.
sadly what with the radio mergers the dxing landscape has, sadly gone.
here in the uk, am is shutting down, there are only a few stations left now.
also sadly, I don't have 24gb of ram to playwith.
only have 16, could a budget version be ran using say gemma4 e4b or something??
man, if within 20 minutes the station went off the rails, could that be due to the context length?
1
0
0
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
2mo
@mhussain The big eater of vram is ace-step. Everything else could fit. I think it's due both to context length issues, poor prompting, and some syncing bugs I have. Right now I have several different queues for audio (background, foreground, etc). Sometimes they get out of sync, meaning playback sounds off, and the AI's get confused about what's playing where and when.
0
0
0
0