1y
So this looks like a high quality, fast, natural, and open source TTS system in Python. A key candidate for an . Unfortunately, I find addon development super confusing. Is there a good template to start from or something? github.com/thewh1teagle/kokoro-onnx
11
15
8
0
1y
Here's a much longer example of the quality of speech Kokoro TTS generates. I really do think it might be a decent addon. The weird pauses are because I'm just giving it a big long string, rather than chunking it like I should. It generates this in real time on CPU, and faster on GPU. The code to generate it is as follows:
import soundfile as sf
from kokoro_onnx import Kokoro
from onnxruntime import InferenceSession

session = InferenceSession("kokoro-v0_19.onnx", providers=["ROCMExecutionProvider", "CPUExecutionProvider"])
kokoro = Kokoro.from_session(session, "voices.json")
samples, sample_rate = kokoro.create(
"He wasn't sleeping very well, and he knew the people around him noticed, but he didn't know what to do about it. He had quietly gone to Madame Pomfrey, who had regretfully told him that Dreamless Sleep was highly addicting and that while she could give him the occasional dose, it would have to be spread out enough to prevent it from becoming addicting – meaning he could only take it one night out of every two weeks or so. It was one night more of productive sleep than he'd be getting otherwise, so he still did it, but it didn't help the larger issue. He wasn't under the effects of any nightmare-inducing Curses, potions, or other magical ailments, so there was nothing for Madame Pomfrey to do. The nightmares were coming from his own mind, and she was not a Mind-Healer. She'd offered to try and connect Harry with one, but when Harry discovered that it involved having someone else quite literally entering his mind with magic and helping him sort out things like trauma he couldn't. If Harry couldn't even tell Hermione the extent of what he'd suffered at the Dursley's, he wasn't about to let a stranger into his mind to see it. Let alone the 'adventures' of his Hogwarts years. So the nightmares persisted, and with the poor quality of sleep serving as the first domino, everything else slowly began to fall. His grades weren't slipping yet, but he was struggling with the study schedule Hermione had set out for them and doing his homework took more effort, more energy that he didn't have.", voice="af_sarah", speed=1.0, lang="en-us"
)
sf.write("audio.wav", samples, sample_rate)
print("Created audio.wav")
8
3
3
0
User avatar
Sean Randall @cachondo@defcon.social
1y
@fastfinge she's quite pleasant. I regret that I have now read so much fanfic that I can't immediately identify that one, though.
1
0
0
0
1y
@cachondo Here's a UK English sample of the same text. It sounds fine to my ear, but I'm not British. Thoughts?
2
0
1
0
User avatar
Sean Randall @cachondo@defcon.social
1y
@fastfinge it's a very nice neutral sort of an accent. Goes a bit funny on the ends of some words, one, and so are good examples in that sample. But I can see it being a great option for people who want more Human-sounding voices.
1
0
1
0
1y
@cachondo So looking at it, it looks like it just uses the phonemes generated by espeak, and passes those to the natural voices. So if you use a voice trained on American English, and ask for en-gb, it'll do it anyway and sound terrible.
1
0
0
0

User avatar
Sean Randall @cachondo@defcon.social
1y
@fastfinge haha that's rather funny.
One of the biggest complaints from users new to screen reading when I taught was the quality of the available voices. The school paid for vocaliser I think but that was as good as it got. I did get a few people onto the neural stuff, but it was in its infancy when I left.
This sounds really smooth in comparison.
0
0
1
0