Note by @fastfinge

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

1y

So this looks like a high quality, fast, natural, and open source TTS system in Python. A key candidate for an #NVDA #addon. Unfortunately, I find #nvdasr addon development super confusing. Is there a good template to start from or something? github.com/thewh1teagle/kokoro-onnx

11

15

8

0

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

1y

Here's a much longer example of the quality of speech Kokoro TTS generates. I really do think it might be a decent #NVDA addon. The weird pauses are because I'm just giving it a big long string, rather than chunking it like I should. It generates this in real time on CPU, and faster on GPU. The code to generate it is as follows:
import soundfile as sf
from kokoro_onnx import Kokoro
from onnxruntime import InferenceSession

session = InferenceSession("kokoro-v0_19.onnx", providers=["ROCMExecutionProvider", "CPUExecutionProvider"])
kokoro = Kokoro.from_session(session, "voices.json")
samples, sample_rate = kokoro.create(
"He wasn't sleeping very well, and he knew the people around him noticed, but he didn't know what to do about it. He had quietly gone to Madame Pomfrey, who had regretfully told him that Dreamless Sleep was highly addicting and that while she could give him the occasional dose, it would have to be spread out enough to prevent it from becoming addicting – meaning he could only take it one night out of every two weeks or so. It was one night more of productive sleep than he'd be getting otherwise, so he still did it, but it didn't help the larger issue. He wasn't under the effects of any nightmare-inducing Curses, potions, or other magical ailments, so there was nothing for Madame Pomfrey to do. The nightmares were coming from his own mind, and she was not a Mind-Healer. She'd offered to try and connect Harry with one, but when Harry discovered that it involved having someone else quite literally entering his mind with magic and helping him sort out things like trauma he couldn't. If Harry couldn't even tell Hermione the extent of what he'd suffered at the Dursley's, he wasn't about to let a stranger into his mind to see it. Let alone the 'adventures' of his Hogwarts years. So the nightmares persisted, and with the poor quality of sleep serving as the first domino, everything else slowly began to fall. His grades weren't slipping yet, but he was struggling with the study schedule Hermione had set out for them and doing his homework took more effort, more energy that he didn't have.", voice="af_sarah", speed=1.0, lang="en-us"
)
sf.write("audio.wav", samples, sample_rate)
print("Created audio.wav")

8

3

0

Tamas G @Tamasg@mindly.social

1y

@fastfinge so can it only write direct-to-file, or could it also raw PCM data to a callback or have a way of reading a buffer it creates with that raw data? NVDA drivers would work infinitely sympler under that model. Sadly no real template for one exists beyond just looking at the code for drivers like DECTalk or Eloquence, Sonata, ETC and basing it off them to see which pattern best fits that synths way of operating on things.

2

0

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

1y

@Tamasg @tspivey So it looks like the repo is still super active. For this to be an addon, we want: streaming samples in real time, and indication of speech starting and stopping. Anything else? I can open an issue to ask.

1

0

1

0

Tamas G @Tamasg@mindly.social

1y

@fastfinge @tspivey I think yeah, a way to inject stop sequences mid-speech as well (so we could call a shut-up or stop from the main thread during playback) - having callbacks for stop can be nice, sometimes we can gather that just on the basis of the audio buffer closing itself if that's done in realtime with speech fragments.

1

0