User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4mo
The State of Modern AI Text To Speech Systems for Screen Reader Users: The past year has seen an explosion in new text to speech engines based on neural networks, large language models, and machine learning. But has any of this advancement offered anything to those using screen readers? stuff.interfree.ca/2026/01/05/ai-tts-for-screenreaders.html
12
41
15
0
User avatar
Zach Bennoui @ZBennoui@dragonscave.space
4mo
@fastfinge Really interesting article. I'm particularly passionate about this subject, I've been fascinated with TTS for a number of years. I've trained many voices, both for Piper and some of the newer LLM based systems, and while I can't speak to the speed issue, training data is extremely important.

What you feed into these models has a big impact on the voice's performance overall. If you give it stuff scrape from the web, random audiobooks that weren't optimized for TTS, things like that, you're not going to get good results for the type of work screen reader users do every day. This applies to all of these systems, not even just neural networks. The latency / responsiveness issue is something we'll have to solve at some point, because I don't think using TTS systems last updated in 2003 is going to work out in the longterm, as much as I love Eloquence.

In my ideal world, we would have either a machine learning based or formant system that is easy to train / maintain. Big companies have lost interest in on device TTS, not even just for screen reader users. Many of the solutions being put out now are cloud based, and while developers are still creating on device models, as said in the article, they're not optimized for our needs and may never be. I think we have to take matters into our own hands and figure this out, but I believe with enough people we can make it happen.
3
2
1
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4mo
@ZBennoui Agreed. I think blast bay is close to the right track. If only it was open and the issues pronouncing words were fixed. The speed and sound of the voices are top notch.
0
1
1
0