User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1y
So this looks like a high quality, fast, natural, and open source TTS system in Python. A key candidate for an . Unfortunately, I find addon development super confusing. Is there a good template to start from or something? github.com/thewh1teagle/kokoro-onnx
11
15
8
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1y
Yeah, I am deeply confused about how buffers work and how to indicate when speaking is complete and do indexing and so-on. If this is going to be an addon, someone else will have to do it.
1
0
1
0
User avatar
Tyler Spivey @tspivey@dragonscave.space
1y
@fastfinge You need support from the synth for some features. This one doesn't have anything. Once it starts speaking, it blocks until it's done, so you can't interrupt it.
1
0
0
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1y
@tspivey Wouldn't you just stop playing the samples it gave you?
1
0
0
0

User avatar
Tyler Spivey @tspivey@dragonscave.space
1y
@fastfinge That works. But you're still sitting there waiting a few seconds for it to finish generating them.
2
0
0
0
User avatar
Tyler Spivey @tspivey@dragonscave.space
1y
@fastfinge Taking this sentence and passing it straight through, it pauses after highly. That's not even that many words. He had quietly gone to Madame Pomfrey, who had regretfully told him that Dreamless Sleep was highly addicting and that while she could give him the occasional dose, it would have to be spread out enough to prevent it from becoming addicting – meaning he could only take it one night out of every two weeks or so.
3
0
0
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1y
@tspivey Hmmm, I assumed that was just because I was passing an enormous text block with multiple sentences. Hadn't tested with single sentences yet.
0
0
0
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1y
@tspivey Also, how does NVDA chunk text it passes to a synth? Even that's not really documented anywhere LOL. I think Kokoro inference would need running in its own thread, so the thread could be killed when we wanted to stop speech rather than generating extra samples, and a knew thread could be started so you could start the new speech quickly, like when someone's pressing down arrow rapidly. But I don't have the time, and I'm not smart enough.
1
0
0
0
User avatar
Tyler Spivey @tspivey@dragonscave.space
1y
@fastfinge It doesn't. It leaves that up to the synth. If you're doing say all, then it tries to split by sentence and does it badly.
1
0
0
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1y
@tspivey So if I cursor up onto a line with fifty thousand characters, that's why it just dies. Ah.
2
0
0
0
User avatar
Tyler Spivey @tspivey@dragonscave.space
1y
@fastfinge Yep. There are workarounds for that, disabling NVDA's processing improves it.
1
0
1
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1y
@tspivey Yeah, I'm increasingly convinced that @x0 is correct, and this would need to be part of Sonata if this was going to happen at all. They seem to have solved those issues mostly.
1
0
0
0
User avatar
Tyler Spivey @tspivey@dragonscave.space
1y
1
0
0
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1y
@tspivey That gives me: TypeError: GlobalPlugin.script_toggleX.<locals>.<lambda>() got an unexpected keyword argument 'normalize'
2
0
0
0
User avatar
Sean Randall @cachondo@defcon.social
1y
@fastfinge @tspivey my wife and I had an unexpected keyword argument a few years ago. She'd never heard of the word nomenclature.
I ended up with gnomes on a clay chair as a whacky present as a reminder of the utter ridiculousness of the discussion.
1
0
0
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1y
@cachondo @tspivey Sounds like when I discovered my boss had never heard the word cogitate. Native English speaker, with a masters degree. Go figure.
1
0
0
0
User avatar
Sean Randall @cachondo@defcon.social
1y
@fastfinge @tspivey I read an article a while ago about a young American who's dad hadn't heard of a word she'd picked up at college. it wasn't a particularlycomplicated or unusual word, but much was made of it in this article.

I sometimes wish I had a searchable text file of everything my screen reader ever said.
1
0
1
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1y
@cachondo @tspivey Then we could feed it into ollama and have AI search it for us! LOL
0
0
0
0
User avatar
Tyler Spivey @tspivey@dragonscave.space
1y
@fastfinge Ok, redownload and that should be fixed.
1
0
0
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1y
@tspivey Yup, fixed! Are there docs?
1
0
0
0
User avatar
Tyler Spivey @tspivey@dragonscave.space
1y
@fastfinge Nope. There's a toggle.txt in the root of the addon, but I don't know how updated that is. This thing has been hacked on over the years.
1
0
1
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1y
@tspivey Yeah, I can tell. TonyML's earcons addon also breaks a bunch of the features rofl.
0
0
0
0
User avatar
Zach Bennoui @ZBennoui@dragonscave.space
1y
@tspivey @fastfinge I'm not sure this is the reason for pausing, but the model has a total context size of 500 characters and will not do well with input longer than that. It may also just be bad training data, sentences not ending with correct punctuation, primarily trained on paragraphs, etc. I’ve trained many TTS models over the last few years and data quality is extremely important, something lacking in most open source TTS systems out there.
1
0
1
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1y
@ZBennoui @tspivey I think it's something with the onnx implementation actually. The pytorch version doesn't have this issue. There's an open issue looking into it.
0
0
0
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1y
@tspivey That's why you start a session, so the model stays loaded in memory. Then I think you can actually stream output from onnxruntime bite by bite, I'm just not sure how.
0
0
0
0