Note by @fastfinge

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

So this looks like a high quality, fast, natural, and open source TTS system in Python. A key candidate for an #NVDA #addon. Unfortunately, I find #nvdasr addon development super confusing. Is there a good template to start from or something? github.com/thewh1teagle/kokoro-onnx

🍂Melissa🌠 @EmeraldRose@dragonscave.space

@fastfinge Ooh, taht does sound good. I'd use that.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

Here's a much longer example of the quality of speech Kokoro TTS generates. I really do think it might be a decent #NVDA addon. The weird pauses are because I'm just giving it a big long string, rather than chunking it like I should. It generates this in real time on CPU, and faster on GPU. The code to generate it is as follows:
import soundfile as sf
from kokoro_onnx import Kokoro
from onnxruntime import InferenceSession

session = InferenceSession("kokoro-v0_19.onnx", providers=["ROCMExecutionProvider", "CPUExecutionProvider"])
kokoro = Kokoro.from_session(session, "voices.json")
samples, sample_rate = kokoro.create(
"He wasn't sleeping very well, and he knew the people around him noticed, but he didn't know what to do about it. He had quietly gone to Madame Pomfrey, who had regretfully told him that Dreamless Sleep was highly addicting and that while she could give him the occasional dose, it would have to be spread out enough to prevent it from becoming addicting – meaning he could only take it one night out of every two weeks or so. It was one night more of productive sleep than he'd be getting otherwise, so he still did it, but it didn't help the larger issue. He wasn't under the effects of any nightmare-inducing Curses, potions, or other magical ailments, so there was nothing for Madame Pomfrey to do. The nightmares were coming from his own mind, and she was not a Mind-Healer. She'd offered to try and connect Harry with one, but when Harry discovered that it involved having someone else quite literally entering his mind with magic and helping him sort out things like trauma he couldn't. If Harry couldn't even tell Hermione the extent of what he'd suffered at the Dursley's, he wasn't about to let a stranger into his mind to see it. Let alone the 'adventures' of his Hogwarts years. So the nightmares persisted, and with the poor quality of sleep serving as the first domino, everything else slowly began to fall. His grades weren't slipping yet, but he was struggling with the study schedule Hermione had set out for them and doing his homework took more effort, more energy that he didn't have.", voice="af_sarah", speed=1.0, lang="en-us"
)
sf.write("audio.wav", samples, sample_rate)
print("Created audio.wav")

Lynette @lynessence@caneandable.social

@fastfinge Oh that’s an 11 labs voice. Nice.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@lynessence IIs it? I don't know what there built in voices are now. Guess that's where they got the training data.

Lynette @lynessence@caneandable.social

@fastfinge Yes. I just listened to a book using that voice in ElevenReader. It’s not bad.

Lynette @lynessence@caneandable.social

@fastfinge I have tried to speed those 11 lab voices up quite a bit using ElevenReader, and I don’t really like the results. Curious to hear what you think.

Brandon @serrebi@tweesecake.social

@fastfinge I agree it's pretty good

Sean Randall @cachondo@defcon.social

@fastfinge she's quite pleasant. I regret that I have now read so much fanfic that I can't immediately identify that one, though.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@cachondo Here's a UK English sample of the same text. It sounds fine to my ear, but I'm not British. Thoughts?

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@cachondo Oh, and the fanfic is: Harry Potter and the Art of Getting Your Shit Together — by MsStarryNightSky.

Andre Louis @FreakyFwoof@universeodon.com

@fastfinge @cachondo I don't think this is one @SeveraSnape and I have. Sounds new to me too and that's surprising considering how much I read.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@FreakyFwoof @SeveraSnape @cachondo It's a Harry/Hermione fluff fic, so probably something you'd both enjoy. I'm surprised you don't already have it. Harry Potter and the Art of Getting Your Shit Together
Posted originally on the Archive of Our Own at archiveofourown.org/works/59310490.

Katy T @SeveraSnape@mastodon.social

@FreakyFwoof @fastfinge @cachondo I don't think we do, but we shall in a minute.

Sean Randall @cachondo@defcon.social

@SeveraSnape @FreakyFwoof @fastfinge still in progress, isn't it?

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@cachondo @SeveraSnape @FreakyFwoof Yup.

Andre Louis @FreakyFwoof@universeodon.com

@SeveraSnape @fastfinge @cachondo I platonically love this girl. Always on top of things. Mad respect.

Sean Randall @cachondo@defcon.social

@FreakyFwoof @SeveraSnape @fastfinge the dedication is awesome. I need a way of knowing when the in progress things are done, seriously. DO I perhaps need to make an ao3 account or something?

Andre Louis @FreakyFwoof@universeodon.com

@cachondo @SeveraSnape @fastfinge I had to, because I follow sooooo many things on both ff net and ao3, so I set up a rule to forward any HP-related emails to a dedicated folder. I set it such that if the emails do not contain the text 'Harry Potter' it immediately deletes them. It's cut down on my inbox clutter 10-fold if not more, and the folder of fics is purely HP-related. I loves it.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@cachondo @SeveraSnape @FreakyFwoof Yeah, and get fanficfare configured on a server somewhere. It can monitor an imap account, find emails from FF and AO3, and auto-update your epub files.

Sean Randall @cachondo@defcon.social

@fastfinge @SeveraSnape @FreakyFwoof Blimey, that is clever.

Katy T @SeveraSnape@mastodon.social

@cachondo @fastfinge @FreakyFwoof It does help clean up the inbox.

Katy T @SeveraSnape@mastodon.social

@cachondo @FreakyFwoof @fastfinge For Andre's folder, I've begun putting status in there, I have visions of a Change log like I do for mine... but there's so much I'm doing to his folder right now it would just be crazy. but, you would only need an account if you want to get author alerts. And... if you follow a bunch ten you would get lots of emails

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@FreakyFwoof @SeveraSnape @cachondo Same. Although apparently she drinks coke in the morning instead of coffee! So I dunno... LOL JK

Katy T @SeveraSnape@mastodon.social

@fastfinge @FreakyFwoof @cachondo hahahahaha!!!! I do indeed, but I drink tea too... does that count toward the good? :)

Andre Louis @FreakyFwoof@universeodon.com

@SeveraSnape @fastfinge @cachondo It undoes a lot of the bad, let's just put it this way haha

Katy T @SeveraSnape@mastodon.social

@FreakyFwoof @fastfinge @cachondo Oh man!!!! hahahahha!

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@SeveraSnape @cachondo @FreakyFwoof Depends. Peperment? What kind of tea? LOL

Katy T @SeveraSnape@mastodon.social

@fastfinge @cachondo @FreakyFwoof Black tea. Like mint too, though that's not my go to. I like a bunch of different kinds.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@SeveraSnape @cachondo @FreakyFwoof Coffee is the one true caffeine delivery system, and mint tea is the only acceptable hot drink without caffeine. ROFL

Katy T @SeveraSnape@mastodon.social

@fastfinge @cachondo @FreakyFwoof Coffee puts me right to sleep.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@SeveraSnape @cachondo @FreakyFwoof Really? I hear that's one of the primary symptoms of people having ADHD.

Katy T @SeveraSnape@mastodon.social

@fastfinge @cachondo @FreakyFwoof So I've heard. I don't care to figure that out. I focus just fine on what I need to. hahaha.

Ather Jammoa @atherjammoa@mastodon.social

@SeveraSnape @fastfinge @cachondo @FreakyFwoof You should add a dash of honey. You won't regret it!

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@atherjammoa @SeveraSnape @cachondo @FreakyFwoof That's what I do whenever I have a sore throat.

Ather Jammoa @atherjammoa@mastodon.social

@fastfinge @SeveraSnape @cachondo @FreakyFwoof Lol yeah same.

Katy T @SeveraSnape@mastodon.social

@FreakyFwoof @fastfinge @cachondo Yay! Thanks!

JamminJerry @JamminJerry@mastodon.stickbear.me

@SeveraSnape @FreakyFwoof @fastfinge @cachondo I am just waiting to see it show up in a folder somewhere, I want to read that one. sounds like a good one.

Katy T @SeveraSnape@mastodon.social

@JamminJerry @FreakyFwoof @fastfinge @cachondo It's in Andre's under the author folder mentioned.

JamminJerry @JamminJerry@mastodon.stickbear.me

@SeveraSnape @FreakyFwoof @fastfinge @cachondo ah, there it is. thanks for that.

JamminJerry @JamminJerry@mastodon.stickbear.me

@SeveraSnape @FreakyFwoof @fastfinge @cachondo holy cow! those are going to be some very long chapters! around 1.7 meg file, and only 20 chapters. just wow!

Sean Randall @cachondo@defcon.social

@fastfinge it's a very nice neutral sort of an accent. Goes a bit funny on the ends of some words, one, and so are good examples in that sample. But I can see it being a great option for people who want more Human-sounding voices.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@cachondo So looking at it, it looks like it just uses the phonemes generated by espeak, and passes those to the natural voices. So if you use a voice trained on American English, and ask for en-gb, it'll do it anyway and sound terrible.

Sean Randall @cachondo@defcon.social

@fastfinge haha that's rather funny.
One of the biggest complaints from users new to screen reading when I taught was the quality of the available voices. The school paid for vocaliser I think but that was as good as it got. I did get a few people onto the neural stuff, but it was in its infancy when I left.
This sounds really smooth in comparison.

Andre Louis @FreakyFwoof@universeodon.com

@fastfinge This is nice. Yeah, this would be a nice addon.

Tamas G @Tamasg@mindly.social

@fastfinge so can it only write direct-to-file, or could it also raw PCM data to a callback or have a way of reading a buffer it creates with that raw data? NVDA drivers would work infinitely sympler under that model. Sadly no real template for one exists beyond just looking at the code for drivers like DECTalk or Eloquence, Sonata, ETC and basing it off them to see which pattern best fits that synths way of operating on things.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@Tamasg It returns samples by default. I'm just using a python library to write them to a file.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@Tamasg @tspivey So it looks like the repo is still super active. For this to be an addon, we want: streaming samples in real time, and indication of speech starting and stopping. Anything else? I can open an issue to ask.

Tamas G @Tamasg@mindly.social

@fastfinge @tspivey I think yeah, a way to inject stop sequences mid-speech as well (so we could call a shut-up or stop from the main thread during playback) - having callbacks for stop can be nice, sometimes we can gather that just on the basis of the audio buffer closing itself if that's done in realtime with speech fragments.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@Tamasg @tspivey Issue is here if anyone wants to chime in: github.com/thewh1teagle/kokoro-onnx/issues/13

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@Tamasg @tspivey And here's how to do streaming: github.com/thewh1teagle/kokoro-onnx/blob/main/examples/with_stream.py

JamminJerry @JamminJerry@mastodon.stickbear.me

@fastfinge oh wow! I really really like that voice! that would be awesome for reading with.

Serena 🏳️‍🌈 @SerenaTori@dragonscave.space

@fastfinge @FreakyFwoof Yeah, that sounds amazing. I would love to read stuff with that synthesiser.

Andre Louis @FreakyFwoof@universeodon.com

@SerenaTori @fastfinge Now to gently gently request @Tamasg to make it a reality... Haha

Tamas G @Tamasg@mindly.social

@FreakyFwoof @SerenaTori @fastfinge ha. I know very little about how we could get it compiled right in the add-on. (I know there was a discussion of this earlier so if that build process for onnxruntime into the add-on succeeded, would love some basic copy then.) For anyone wanting to try, I think looking at something like the Brailab driver (which is super minimal and in the end all you're really going to use are the getters and setters for the synth driver, the way you do speech is obviously not at all like Brailab), and then crafting in to open the stream might work. But between the latest family emergency, work at Spotify with the new year / new projects, I'm afraid I'll be swamped for awhile to give it that truly comparitive look. I'd also love to see a test run at how quick it can synthesize speech on slower CPUs especially when that speech is interrupted mid-utterance - how does it handle stopping a stream and loading a new one, is there lots of latancy? A simple py test that just throws lots of speech chunks like that, stops, starts, would give us an idea maybe to then know if it's worth turning into a driver just yet.

Andre Louis @FreakyFwoof@universeodon.com

@Tamasg @SerenaTori @fastfinge Sorry to hear about family emergencies, never nice to deal with. I hope things can be sorted out for the better.

Re slow CPU though, that's where I come in. I am right now even, using an Intel Core I5-3570K from 2012. It runs every synth very well, apart from Piper which it struggles with due to the neural aspect of it. If my machine can run... Whatever you guys end up coming up with (hopefully) then anything else should be a breeze.

Mira 🤞🇧🇬🇭🇺 @tardis@tardis.pw

@FreakyFwoof @Tamasg @SerenaTori @fastfinge I have an even slower one. Yay for countries in the middle of... Well somewhere, and computers from 2009 haha if something can even run on that, I'd be surprised. How's that for a slow processor? It's pretty ancient. The synth sounds nice, yeah, don't like how it reads hashtag, but I guess that's me. There's also something about question marks it clearly missed, but I think it needs to be fed a bigger chunk of text to see if it'll sound better. Otherwise, for the quality, Bleh, either my ears, or something, do not consider it a great quality in the sound terms, but for a TTS, I guess it's good. says the person who daily drives a TTS that came out in 2001. LOL.

Andre Louis @FreakyFwoof@universeodon.com

@tardis @Tamasg @SerenaTori @fastfinge What's your CPU spec then, and your daily synth of choice?

Mira 🤞🇧🇬🇭🇺 @tardis@tardis.pw

@FreakyFwoof @Tamasg @SerenaTori @fastfinge A synth that does English people no good. Haha. And I have a dell from 2009, it has still a 32Bit windows 10 version, so it tells you something. :D

Mira 🤞🇧🇬🇭🇺 @tardis@tardis.pw

@FreakyFwoof @Tamasg @SerenaTori @fastfinge I also cannot tell you the full specs. Computer not here, sadly. It has a removable battery though, that gave up a long time ago, then I fell down some stairs while carrying set computer, and the pixels in the screen went poof, and no screen.

x0 @x0@dragonscave.space

@fastfinge I wonder if Sonata would try to incorporate it? The trick with stuff like this is you might actually want to use a server process model rather than trying to run it from within NVDA itself.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@x0 Yeah, it does have a ton of dependencies. I will say all of the voices are better than Sonata/piper, IMHO. Even if it does look like they're all eleven labs ripoffs.

Peter Vágner @pvagner@fedi.ml

@fastfinge I am wondering how it compares to #optispeech developed by @mush42
Or which one is more likelly to get more support and be preffered.
github.com/mush42/optispeech

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@pvagner @mush42 I'm not sure. I do kind of worry about a tts developed by and for blind people and if it can be kept up to date and maintained.

Peter Vágner @pvagner@fedi.ml

@fastfinge I understand @mush42 has made verry significant progress for example as compared to piper TTS. To me it looks it's much lighter for both training and using trained model even enhancing audio quality and elligibility in the process. This is just my guess but with such an achievement it's fine not to limit it to blind audience exclusivelly. This is how I am seeing #optispeech. However I haven't played with kokoro TTS thus I have asked how much do you like it for example while comparing to something else, perhaps piper TTS if you do know that one.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@pvagner @mush42 I like kokoro much better than piper. It sounds more natural with fewer artifacts.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

For the curious, here are all the available Kokoro English voices reading the same text: share.interfree.ca/app/open/3ca2Pb7oiFL-4PXSMD6qMeT-Pb1wqmSJyid-NP7MAAJS12s?view=1

Zach Bennoui @ZBennoui@dragonscave.space

@fastfinge I heard about this project a few months back when it was still just a Huggingface demo. The model was trained on outputs from proprietary TTS systems including Eleven Labs and Open AI, hence why the quality is so good. Really cool project, and the model is still being worked on.

James Scholes @jscholes@dragonscave.space

@fastfinge I suspect your first big headache will be getting onnxruntime (and any other heavy dependencies) installed into the add-on's environment. Doesn't look like simple pure Python code.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@jscholes You can just do it with pip install --target=. to force pip to install a package and all dependencies to the current directory. Then import from the extension directory. The only issue is I'm not sure if onnxruntime has 32-bit binaries or if I'll need to cross-compile the wheel from source.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

It's super simple to set up if you want to play. Make a folder for it, change into the folder in your terminal, then do:
pip install --target=. kokoro-onnx soundfile
winget install wget
wget github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/kokoro-v0_19.onnx
wget github.com/thewh1teagle/kokoro-onnx/releases/download/model-files/voices.json

Then you can call it from Python. It supports US and UK English, plus French, Korean, Chinese, and Japanese.

Florian @zersiax@cupoftea.social

@fastfinge outside of the dev guide and addon dev guide on github, not ... really, that I know of. Admittedly, those resources HAVE gotten a bit better as of late

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

Yeah, I am deeply confused about how buffers work and how to indicate when speaking is complete and do indexing and so-on. If this is going to be an #NVDA addon, someone else will have to do it.

Tyler Spivey @tspivey@dragonscave.space

@fastfinge You need support from the synth for some features. This one doesn't have anything. Once it starts speaking, it blocks until it's done, so you can't interrupt it.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@tspivey Wouldn't you just stop playing the samples it gave you?

Tyler Spivey @tspivey@dragonscave.space

@fastfinge That works. But you're still sitting there waiting a few seconds for it to finish generating them.

Tyler Spivey @tspivey@dragonscave.space

@fastfinge Taking this sentence and passing it straight through, it pauses after highly. That's not even that many words. He had quietly gone to Madame Pomfrey, who had regretfully told him that Dreamless Sleep was highly addicting and that while she could give him the occasional dose, it would have to be spread out enough to prevent it from becoming addicting – meaning he could only take it one night out of every two weeks or so.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@tspivey Hmmm, I assumed that was just because I was passing an enormous text block with multiple sentences. Hadn't tested with single sentences yet.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@tspivey Also, how does NVDA chunk text it passes to a synth? Even that's not really documented anywhere LOL. I think Kokoro inference would need running in its own thread, so the thread could be killed when we wanted to stop speech rather than generating extra samples, and a knew thread could be started so you could start the new speech quickly, like when someone's pressing down arrow rapidly. But I don't have the time, and I'm not smart enough.

Tyler Spivey @tspivey@dragonscave.space

@fastfinge It doesn't. It leaves that up to the synth. If you're doing say all, then it tries to split by sentence and does it badly.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@tspivey So if I cursor up onto a line with fifty thousand characters, that's why it just dies. Ah.

Tyler Spivey @tspivey@dragonscave.space

@fastfinge Yep. There are workarounds for that, disabling NVDA's processing improves it.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@tspivey Yeah, I'm increasingly convinced that @x0 is correct, and this would need to be part of Sonata if this was going to happen at all. They seem to have solved those issues mostly.

Tyler Spivey @tspivey@dragonscave.space

@fastfinge To do that, you need toggleX, then NVDA+0, z. www.dropbox.com/scl/fi/qgz98942oyhv4b3crrpr5/toggleX.nvda-addon?rlkey=hhnevrqlrheiqk9fryoujprwu&dl=1

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@tspivey That gives me: TypeError: GlobalPlugin.script_toggleX.<locals>.<lambda>() got an unexpected keyword argument 'normalize'

Sean Randall @cachondo@defcon.social

@fastfinge @tspivey my wife and I had an unexpected keyword argument a few years ago. She'd never heard of the word nomenclature.
I ended up with gnomes on a clay chair as a whacky present as a reminder of the utter ridiculousness of the discussion.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@cachondo @tspivey Sounds like when I discovered my boss had never heard the word cogitate. Native English speaker, with a masters degree. Go figure.

Sean Randall @cachondo@defcon.social

@fastfinge @tspivey I read an article a while ago about a young American who's dad hadn't heard of a word she'd picked up at college. it wasn't a particularlycomplicated or unusual word, but much was made of it in this article.

I sometimes wish I had a searchable text file of everything my screen reader ever said.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@cachondo @tspivey Then we could feed it into ollama and have AI search it for us! LOL

Tyler Spivey @tspivey@dragonscave.space

@fastfinge Ok, redownload and that should be fixed.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@tspivey Yup, fixed! Are there docs?

Tyler Spivey @tspivey@dragonscave.space

@fastfinge Nope. There's a toggle.txt in the root of the addon, but I don't know how updated that is. This thing has been hacked on over the years.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@tspivey Yeah, I can tell. TonyML's earcons addon also breaks a bunch of the features rofl.

Zach Bennoui @ZBennoui@dragonscave.space

@tspivey @fastfinge I'm not sure this is the reason for pausing, but the model has a total context size of 500 characters and will not do well with input longer than that. It may also just be bad training data, sentences not ending with correct punctuation, primarily trained on paragraphs, etc. I’ve trained many TTS models over the last few years and data quality is extremely important, something lacking in most open source TTS systems out there.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@ZBennoui @tspivey I think it's something with the onnx implementation actually. The pytorch version doesn't have this issue. There's an open issue looking into it.

🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca

@tspivey That's why you start a session, so the model stays loaded in memory. Then I think you can actually stream output from onnxruntime bite by bite, I'm just not sure how.