User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1y
So this looks like a high quality, fast, natural, and open source TTS system in Python. A key candidate for an . Unfortunately, I find addon development super confusing. Is there a good template to start from or something? github.com/thewh1teagle/kokoro-onnx
11
15
8
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1y
Here's a much longer example of the quality of speech Kokoro TTS generates. I really do think it might be a decent addon. The weird pauses are because I'm just giving it a big long string, rather than chunking it like I should. It generates this in real time on CPU, and faster on GPU. The code to generate it is as follows:
import soundfile as sf
from kokoro_onnx import Kokoro
from onnxruntime import InferenceSession

session = InferenceSession("kokoro-v0_19.onnx", providers=["ROCMExecutionProvider", "CPUExecutionProvider"])
kokoro = Kokoro.from_session(session, "voices.json")
samples, sample_rate = kokoro.create(
"He wasn't sleeping very well, and he knew the people around him noticed, but he didn't know what to do about it. He had quietly gone to Madame Pomfrey, who had regretfully told him that Dreamless Sleep was highly addicting and that while she could give him the occasional dose, it would have to be spread out enough to prevent it from becoming addicting – meaning he could only take it one night out of every two weeks or so. It was one night more of productive sleep than he'd be getting otherwise, so he still did it, but it didn't help the larger issue. He wasn't under the effects of any nightmare-inducing Curses, potions, or other magical ailments, so there was nothing for Madame Pomfrey to do. The nightmares were coming from his own mind, and she was not a Mind-Healer. She'd offered to try and connect Harry with one, but when Harry discovered that it involved having someone else quite literally entering his mind with magic and helping him sort out things like trauma he couldn't. If Harry couldn't even tell Hermione the extent of what he'd suffered at the Dursley's, he wasn't about to let a stranger into his mind to see it. Let alone the 'adventures' of his Hogwarts years. So the nightmares persisted, and with the poor quality of sleep serving as the first domino, everything else slowly began to fall. His grades weren't slipping yet, but he was struggling with the study schedule Hermione had set out for them and doing his homework took more effort, more energy that he didn't have.", voice="af_sarah", speed=1.0, lang="en-us"
)
sf.write("audio.wav", samples, sample_rate)
print("Created audio.wav")
8
3
3
0
User avatar
Tamas G @Tamasg@mindly.social
1y
@fastfinge so can it only write direct-to-file, or could it also raw PCM data to a callback or have a way of reading a buffer it creates with that raw data? NVDA drivers would work infinitely sympler under that model. Sadly no real template for one exists beyond just looking at the code for drivers like DECTalk or Eloquence, Sonata, ETC and basing it off them to see which pattern best fits that synths way of operating on things.
2
0
0
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1y
@Tamasg @tspivey So it looks like the repo is still super active. For this to be an addon, we want: streaming samples in real time, and indication of speech starting and stopping. Anything else? I can open an issue to ask.
1
0
1
0

User avatar
Tamas G @Tamasg@mindly.social
1y
@fastfinge @tspivey I think yeah, a way to inject stop sequences mid-speech as well (so we could call a shut-up or stop from the main thread during playback) - having callbacks for stop can be nice, sometimes we can gather that just on the basis of the audio buffer closing itself if that's done in realtime with speech fragments.
1
0
0
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1y
@Tamasg @tspivey Issue is here if anyone wants to chime in: github.com/thewh1teagle/kokoro-onnx/issues/13
1
2
2
0