User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
Admin
completely blind computer geek, lover of science fiction and fantasy (especially LitRPG). I work in accessibility, but my opinions are my own, not that of my employer. Fandoms: Harry Potter, Discworld, My Little Pony: Friendship is Magic, Buffy, Dead Like Me, Glee, and I'll read fanfic of pretty much anything that crosses over with one of those.
keyoxide: aspe:keyoxide.org:PFAQDLXSBNO7MZRNPUMWWKQ7TQ
Location
Ottawa
Birthday
1987-12-20
Pronouns
he/him (EN)
matrix @fastfinge:interfree.ca
keyoxide aspe:keyoxide.org:PFAQDLXSBNO7MZRNPUMWWKQ7TQ
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5d
@spacepup @Tamasg From my perspective, I'll keep on working as long as other people will keep on working. This is an absolutely huge, enormous project. We need as many people as possible, and we need people willing to develop expertise. In a perfect world, we'd have a head of phonemizers, a head of klatt synthesis, a head of IPA conversion, a head of phoneme tuning, a head of european languages, a head of asian languages, and a head of integrations and cross-platform support. And each of those heads would have a team of like 5-10 people. But in the real world, there are like two and a half people with full time jobs doing this as a hobby.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5d
@Tamasg @cachondo @jscholes @ZBennoui It really feels like AI should be able to help us here. But I'm not sure how. Some kind of system that takes a waveform and finds the closest approximation it can get by modifying our parameters.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5d
@clv1 @cachondo @Tamasg @jscholes @ZBennoui While the code is all cross-platform, and there's no reason it shouldn't work on Linux and mac, we're best to stick with Windows and NVDA exclusively until we get something we all love.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5d
@jscholes @cachondo @Tamasg @ZBennoui Yup, agreed. And also, people have different preferences. So we can and do get contradictory feedback. Sometimes even from the same person LOL. On top of it all, most people don't have the vocabulary to talk about this stuff. Heck, I don't even have it; I'm not sure the words exist in English. Brighter? Rounder? What do I mean! Do I mean the same thing that you mean? Impossible to tell.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5d
@Tamasg @cachondo @jscholes @ZBennoui Yup. This all comes back to that problem of so many systems all needing tuning. I do think it would really help us to just focus on a single one. IE get this voice sounding correct with pure "aaaaaaaa" and "eeeeee" tones. No words, no pitch or intonation, nothing. Then hook that up to the rest of the systems. Then see where we land and tackle the next thing. Because thinking about the phonemizer and the IPA rules and the intonation system all at once is burning us out and distracting us from finishing the thing we have the most control over: IE the klatt synthesizer and the wave generator. Those are ours, entirely and completely. The other things are not, so those problems are harder.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5d
@cachondo @Tamasg @jscholes @ZBennoui We also lack tools as blind people. For example, I wish I could visually examine the shape and spectrogram of sounds. A useful step here would be to get eloquence to generate a pure, single-note open "aaa" or "eee" tone, and understand the shape of the resulting sound wave. But there are just no accessible tools to do that. So instead we have to go by listening and guess work.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5d
@jscholes @cachondo @Tamasg @ZBennoui The other thing that makes this super, super hard is that there are like nine different systems, and all of them need tuning. And it's impossible to ask people who haven't spent pretty much four days straight thinking exclusively about this for feedback on a particular system, because they all work together to make up the voice, and you can't know where any given issue comes from. There's the rules for going from text to IPA phonemes. Then the rules for determining the way IPA phonemes are actually voiced and fit together. And then there's the intonation table. And then there's the two systems that actually make the sound. Right now I'm mostly looking at the system that actually makes the sound, IE when you do "aaaaaaaaaaaaa" or "eeeeeeeeeeeeeeeee", because that's still not right. But because it's an entire voice, it's even hard for me to separate my own perceptions and fix anything.

The important thing to remember is that eloquence began development in 1982, by a team of about a dozen researchers. It wasn't in the state we know it until around 2002. We have existing research to build on, but no funding and fewer people, and no PHD level speech researchers. So actually doing this, even with the help of AI, is a 20-30 year project before we get close to eloquence levels. Because we have something that "works" and improves step by step, it's easy to lose sight of the size of the problem we're taking on, because it feels like we should be able to get there in a month or two. But that's not realistic.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5d
@ginsenshi @Tamasg @ZBennoui Hmmm a bit, yeah.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5d
@MariahL @cachondo @Tamasg @jscholes @ZBennoui I hear where you're coming from. It needs to be...rounder or wider or more relaxed or something. But I'm at the point where every change I make causes it to sound worse or introduces strange new issues.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5d
@Tamasg @cachondo @jscholes @ZBennoui Aww. This is a super hard problem. Especially because nobody can articulate what they want, we all just know when it's wrong.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5d
@Tamasg @cachondo @ZBennoui Yup. And I find I have to swap back to eloquence frequently, or else I lose my way completely, and everything starts to sound fine to me.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5d
@jscholes @cachondo @Tamasg @ZBennoui So some of the issues you're identifying have to do with the espeak phonemizer, and the way phonemes are tuned. We have a lot of work to do there, too. But I feel like if we can get the sound of the voice correct, that will be a lot easier. For example, compare speech player and eloquence saying "eeeeeeeeee". Side by side, it's clear something is still not right with speech player. It needs to be...rounder and brighter? And...those are not useful terms because I'm still struggling to define exactly what I mean by them haha.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5d
@ZBennoui @Tamasg Can you articulate what you dislike about it? I'm still not a fan the way I am of eloquence, but it's getting harder and harder for me to define why, so we can actually fix it.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
6d
@FastSM Loving the client. Works perfectly with iceshrimp as my server.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1w
@we_are_spc @jaybird110127 It's still in beta.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1w
On one hand, I absolutely love and adore pull requests. On the other hand, they make me realize just how bad I am at pretending to be a developer. Anyway, if you use 2026, here's another release of unspoken-ng that fixes Firefox errors while also making everything better because I am apparently incapable of correctly thinking through the effect of threads. You should upgrade ASAP if you use this addon: github.com/fastfinge/unspoken-ng/releases/tag/v1.0.4
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1w
@hosford42 @dpnash Compare that with the version of GNU Speech released in 1995. It still messes up "tear" and "live". But once you get past the unnatural voice, it's far more precise. And once you get used to it, much much easier to listen to at an extremely high rate of speed (4x or more) all day. All text to speech advancement from "AI" is just the wow factor of "Wow, it sounds so human!" But pronunciation...you know, the important part of actually reading text...is either the same or worse. With five thousand times the resources.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1w
@hosford42 @dpnash For example, here's Eleven Labs, the billion dollar voice AI company that's supposed to replace all voice actors forever. I used the voice builder to specifically request received pronunciation. That was not at all what I got. Aside from that, notice the incorrect "tear", pronouncing "plaid" "played", having no idea that "victual" is pronounced "viddle", and a number of other mistakes. I reran it just now, to be as fair as possible. It has not improved.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1w
@dpnash @hosford42 Right, but most text to speech systems have a UK English setting. And the mistakes they're making are on things much more basic than that. For example, far too many so-called state of the art AI TTS systems can't even pronounce "Susy", "plaid", "fuchsia", and "lieutenants".
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1w
@hosford42 I wish it would. Unfortunately, that code is what we use to keep Eloquence alive in the 64-bit NVDA version. So it's awful, for dozens of reasons. This...is a bit clearer? Maybe? Anyway, it's the canonical example of how NVDA officially wants to interact with a text to speech system, written by the NVDA developers themselves. Any text to speech system useful for screen reader users needs to expose everything required for someone to write code like this. Not saying you could or should; there are dozens of blind folks who can do the job of integrating any text to speech system with all of the various API's on all the screen readers and platforms. But we have to have useful hooks to do it. github.com/nvaccess/nvda/blob/master/source/synthDrivers/espeak.py