User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
Admin
completely blind computer geek, lover of science fiction and fantasy (especially LitRPG). I work in accessibility, but my opinions are my own, not that of my employer. Fandoms: Harry Potter, Discworld, My Little Pony: Friendship is Magic, Buffy, Dead Like Me, Glee, and I'll read fanfic of pretty much anything that crosses over with one of those.
keyoxide: aspe:keyoxide.org:PFAQDLXSBNO7MZRNPUMWWKQ7TQ
Location
Ottawa
Birthday
1987-12-20
Pronouns
he/him (EN)
xmpp fastfinge@im.interfree.ca
keyoxide aspe:keyoxide.org:PFAQDLXSBNO7MZRNPUMWWKQ7TQ
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
3mo
@hosford42 I wish it would. Unfortunately, that code is what we use to keep Eloquence alive in the 64-bit NVDA version. So it's awful, for dozens of reasons. This...is a bit clearer? Maybe? Anyway, it's the canonical example of how NVDA officially wants to interact with a text to speech system, written by the NVDA developers themselves. Any text to speech system useful for screen reader users needs to expose everything required for someone to write code like this. Not saying you could or should; there are dozens of blind folks who can do the job of integrating any text to speech system with all of the various API's on all the screen readers and platforms. But we have to have useful hooks to do it. github.com/nvaccess/nvda/blob/master/source/synthDrivers/espeak.py
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
3mo
@hosford42 When it comes to requirements, in general, if it can work with both the SAPI5 and NVDA addons API, it will suit the requirements of speech dispatcher on Linux and the mac API's. The important thing is that most screen readers want to register indexes and callbacks. So, for example, if I press a key to stop the screen reader speaking, it needs to know exactly where the text to speech system stopped so that it can put the cursor in the right place. It also wants to know what the tts system is reading so it can decide when to advance the cursor, get new text from the application to send for speaking, etc. I really really really wish I had a better example of how that works in NVDA than this: github.com/fastfinge/eloquence_64/blob/master/eloquence.py
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
3mo
@hosford42 Absolutely yes to all of the above. I can think of another 10 people on Mastodon at minimum who are also ready and willing to help where ever they can. Just none of us with the skillset to do the actual work.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
3mo
@hosford42 Sadly, this is so far outside of my expertise and abilities it's not even funny. I have an excellent handle on what's needed, and the vague shape of the path forward, but actually doing any of it is way outside of my skillset. If it was anywhere near something I could do, I would have started already. :-)
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
3mo
@hosford42 Also, if you enjoy comparing modern AI efforts with older rule-based text to speech systems, and listening to the AI fail hard, this text is wonderful for that. As far as I'm aware not a single text to speech system, right up to the modern day, can read this one hundred percent correctly. github.com/mym-br/gnuspeech_sa/blob/master/the_chaos.txt

But eloquence gets the closest, gnuspeech second, espeak third, dectalk fourth, and every AI system I've tried a distant last.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
3mo
@hosford42 If you're going to reimplement something, you might be better to go with gnuspeech, as it's known to be in the GPL. At the least, it gives you a vocal model to improve on, that was coded with open research in mind, rather than proprietary code probably written for job security.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
3mo
@hosford42 I also have no idea about any associated IP or patents, though. Wouldn't whoever does it need to be able to prove they never saw the original code, just its outputs? Otherwise you're still infringing, aren't you? In this regard, it's probably actually a bad thing that the dectalk sourcecode is so widely available.

And most of the commits seem to be about just getting it to compile on modern systems with modern toolchains. I dread to think how unsafe closed-source C code written in 1998 is.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
3mo
@hosford42 In general, for training the rules for pronouncing English, the CMU pronouncing dictionary is used: www.speech.cs.cmu.edu/cgi-bin/cmudict

When it comes to open-source speech data, LJSpeech is the best we have, though far from perfect:
keithito.com/LJ-Speech-Dataset/

And here's a link to GnuSpeech, the only open-source fully articulatory text to speech system I'm aware of:
github.com/mym-br/gnuspeech_sa?tab=readme-ov-file

I'm afraid I don't have any particular data of my own.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
3mo
@hosford42 The sourcecode for dectalk is out there. Unfortunately, It's...legally dubious at best. It was leaked by an employee back in the day, and now the copyright status of the code is so unclear that nobody can safely use it for anything, but also nobody can demonstrate clear enough ownership to submit a DMCA and get it taken off github. GNUspeech is also pretty close to what's needed, but it won't even compile without all the NeXT development tools, I don't think. So at best all that would be is a base for something else; modernizing it would probably effectively be a complete rewrite anyway.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
3mo
@hosford42 The reason I say systems-level programming is mostly because for a text to speech system used by a blind power user, you need to keep an eye on performance. If the system crashes and the computer stops talking, the only choice the user has is to hard reset. It would be running and speaking the entire time the computer is in use, so memory leaks and other inefficiencies are going to add up extremely quickly.

From what I can tell, the ideal is some sort of formant-based vocal tract model. Espeak sort of does this, but only for the voiced sounds. Plosives are generated from modeling recorded speech, so sound weird and overly harsh to most users, and I suspect this is where most of the complaints about espeak come from. A neural network or other sort of machine learning model could be useful to discover the best parameters and run the model, but not for generating audio itself, I don't think. This is because most modern LLM-based neural network models can't allow changing of pitch, speed, etc, as all of that comes from the training data.

Secondly, the phonemizer needs to be reproducible. What if, say, it mispronounces "Hermione". With most modern text to speech systems, this is hard to fix; the output is not always the same for any given input. So a correction like "her my oh nee" might work in some circumstances, but not others, because how the model decides to pronounce words and where it puts the emphasis are just a black box. The state of the art, here, remains Eloquence. But it uses no machine learning at all, just hundreds of thousands of hand-coded rules and formants. But, of course, it's closed source (and as far as anyone can tell the source has actually been lost since the early 2000's), so goodness knows what all those rules are.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
3mo
@hosford42 Sadly, there is no money in solving any of my problems. If there was, someone would have solved them. See, for example, my complaints about text to speech systems. stuff.interfree.ca/2026/01/05/ai-tts-for-screenreaders.html

I can go into more detail about why all the options are bad if you want. But this is the sort of problem that eats years of your life, requires advanced mathematics (digital signal processing at a minimum), and advanced linguistics, on top of being a good systems-level programmer.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
3mo
@Tamasg Have you tried Gemini? I find it's the best for stuff where I don't know exactly what I want.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
3mo
@jscholes Maybe. I've played enough of this style of game that I have pretty strong opinions about how it should all work, though.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
3mo
@jscholes If I ever finish the basics, the thing I'm aiming for, probably in 70 years or so, is a singleplayer trade wars/elite style space game. Planetary exploration, trading, combat, procedurally generated galaxy, etc. All the stuff we have in that style is either multiplayer PVP, or only semi-accessible like smugglers5. Interface via console menus, because that feels both simple and retro. But with stuff happening in realtime like textspaced used to be (it's now discontinued), and high quality sound. The idea is you just let it run on your taskbar and check in every 20-30 minutes when you hear something that needs doing. So kind of casual all-day play while multitasking. I've dreamed of something like that for like 20 years, but nobody has done anything even close. Textspaced was the nearest, but it never quite got there, and it's gone, now.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
3mo
@jscholes Hah. So this all started when I decided I wanted a class where I could create a new menu, add the selection key, a name, and a callback function for each item, then call the menu to print itself and do all the input and error checking, and call the callback for whatever item was chosen. And then it just kept growing. I'm not even at the object graph stage. I just hate most of the text games I mentioned, where every menu looks and acts different, because print statements and case statements are just sprinkled all over the code.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
3mo
@jscholes If I do it that way, it just means refactoring later, though. And that's even harder and less fun because now you're rebuilding things that already exist.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
3mo
Games I love like Warsim, The Wastes, and usurper inspired me to think about creating my own console-based game. Then I wrote 750 lines of code just to make a reusable system for console menus. And realized the save system is going to be another 500 lines, probably. And then the settings system. And after thinking about 2000 lines of code before I can get to anything even resembling the simplest game mechanic, I'm not inspired, anymore. Why is 99 percent of programming doing the least interesting part of any project?
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
3mo
I've only played a couple of minutes, but Usurper Reloaded looks really and like loads of fun. With all kinds of content for an alpha 0.1. I never played the original BBS , so I don't know how it compares. Run the .bat file to get the console version: github.com/binary-knight/usurper-reloaded/
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
3mo
@clv1 @cachondo @Tamasg @jscholes @FreakyFwoof @pixelate @ZBennoui @amir And also UX researchers, probably. I can't articulate why eloquence is better than dectalk, for me. Neither, I bet, could Andre articulate what makes Orpheus better than Eloquence, for him. So to get something that makes the largest number of people as happy as possible is a classic UX research problem, probably involving massive surveys, rating and ranking of samples, and so on. I work with the kind of people qualified to do this, and it's a unique skill-set in and of itself.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
3mo
@MostlyBlindGamer But I'm not disappointed that the package description does nothing to explain why the package exists, why it's better, or why it should be used over the dozens of other ORM-like options.