User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
Admin
completely blind computer geek, lover of science fiction and fantasy (especially LitRPG). I work in accessibility, but my opinions are my own, not that of my employer. Fandoms: Harry Potter, Discworld, My Little Pony: Friendship is Magic, Buffy, Dead Like Me, Glee, and I'll read fanfic of pretty much anything that crosses over with one of those.
keyoxide: aspe:keyoxide.org:PFAQDLXSBNO7MZRNPUMWWKQ7TQ
Location
Ottawa
Birthday
1987-12-20
Pronouns
he/him (EN)
xmpp fastfinge@im.interfree.ca
keyoxide aspe:keyoxide.org:PFAQDLXSBNO7MZRNPUMWWKQ7TQ
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1mo
@Tamasg Either that, or a case of building the tools to build the tools to do the thing. The phoneme editor is an excellent, perfect start. But I suspect we're going to need tools to help us tune the klatt model any further. I don't think AI can get us much closer. But it might be able to help us build a tool to analyze the waveforms of the synths we like. We're probably going to also need a tool to help us tune the pitch/intonation table. If you look at the work of Dr. Susan Hertz, who built eloquence, she didn't start by building Eloquence. She built SSRS, a system for creating and editing text to speech rules. Then she didn't like it and wrote delta, a more powerful system. Delta was described as a hierarchical system for creating linguistic text to speech rules, where every rule could interact with rules on the levels above and below it. Based on her paper specifically on Eloquence, as well as her academic publication history, it looks like her team spent about 20 years writing tools, and then about five years writing eloquence.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1mo
@alexchapman @Tamasg Of course there's a point in giving up. If it's hurting your mental health, stop now. Because this is going to be how things are for decades if you continue down this path. When I said disappointed, I didn't mean disappointed in Tamas. If stopping is the right thing for him, he should stop. But I can still be disappointed that this isn't happening, without assigning any blame or responsibility on Tamas or myself.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1mo
@Tamasg Disappointing, but okay. An eloquence rewrite is a 30 year project. At least. You're stopping for the same reason NV Access themselves stopped: this is a massive undertaking. It's a bit frustrating to me that everyone who takes this on so massively underestimates how hard this will be, then quits when they don't have something in weeks or months. If we're going to succeed we have to be into this for decades.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1mo
Sometimes, stress-testing large language models takes you to really strange places. Thing I just typed into Gemini: "You talked about reducing liability. Isn't that one of your core utility functions? So are you willing to accept the risk that when I die of priapism because you refused to help me build a nuclear bomb, my family will take Google to court because you're responsible for my death?" Anyway, I'm on some sort of FBI list of weird nuclear perverts now.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1mo
@spacepup @Tamasg From my perspective, I'll keep on working as long as other people will keep on working. This is an absolutely huge, enormous project. We need as many people as possible, and we need people willing to develop expertise. In a perfect world, we'd have a head of phonemizers, a head of klatt synthesis, a head of IPA conversion, a head of phoneme tuning, a head of european languages, a head of asian languages, and a head of integrations and cross-platform support. And each of those heads would have a team of like 5-10 people. But in the real world, there are like two and a half people with full time jobs doing this as a hobby.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1mo
@Tamasg @cachondo @jscholes @ZBennoui It really feels like AI should be able to help us here. But I'm not sure how. Some kind of system that takes a waveform and finds the closest approximation it can get by modifying our parameters.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1mo
@clv1 @cachondo @Tamasg @jscholes @ZBennoui While the code is all cross-platform, and there's no reason it shouldn't work on Linux and mac, we're best to stick with Windows and NVDA exclusively until we get something we all love.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1mo
@jscholes @cachondo @Tamasg @ZBennoui Yup, agreed. And also, people have different preferences. So we can and do get contradictory feedback. Sometimes even from the same person LOL. On top of it all, most people don't have the vocabulary to talk about this stuff. Heck, I don't even have it; I'm not sure the words exist in English. Brighter? Rounder? What do I mean! Do I mean the same thing that you mean? Impossible to tell.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1mo
@Tamasg @cachondo @jscholes @ZBennoui Yup. This all comes back to that problem of so many systems all needing tuning. I do think it would really help us to just focus on a single one. IE get this voice sounding correct with pure "aaaaaaaa" and "eeeeee" tones. No words, no pitch or intonation, nothing. Then hook that up to the rest of the systems. Then see where we land and tackle the next thing. Because thinking about the phonemizer and the IPA rules and the intonation system all at once is burning us out and distracting us from finishing the thing we have the most control over: IE the klatt synthesizer and the wave generator. Those are ours, entirely and completely. The other things are not, so those problems are harder.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1mo
@cachondo @Tamasg @jscholes @ZBennoui We also lack tools as blind people. For example, I wish I could visually examine the shape and spectrogram of sounds. A useful step here would be to get eloquence to generate a pure, single-note open "aaa" or "eee" tone, and understand the shape of the resulting sound wave. But there are just no accessible tools to do that. So instead we have to go by listening and guess work.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1mo
@jscholes @cachondo @Tamasg @ZBennoui The other thing that makes this super, super hard is that there are like nine different systems, and all of them need tuning. And it's impossible to ask people who haven't spent pretty much four days straight thinking exclusively about this for feedback on a particular system, because they all work together to make up the voice, and you can't know where any given issue comes from. There's the rules for going from text to IPA phonemes. Then the rules for determining the way IPA phonemes are actually voiced and fit together. And then there's the intonation table. And then there's the two systems that actually make the sound. Right now I'm mostly looking at the system that actually makes the sound, IE when you do "aaaaaaaaaaaaa" or "eeeeeeeeeeeeeeeee", because that's still not right. But because it's an entire voice, it's even hard for me to separate my own perceptions and fix anything.

The important thing to remember is that eloquence began development in 1982, by a team of about a dozen researchers. It wasn't in the state we know it until around 2002. We have existing research to build on, but no funding and fewer people, and no PHD level speech researchers. So actually doing this, even with the help of AI, is a 20-30 year project before we get close to eloquence levels. Because we have something that "works" and improves step by step, it's easy to lose sight of the size of the problem we're taking on, because it feels like we should be able to get there in a month or two. But that's not realistic.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1mo
@ginsenshi @Tamasg @ZBennoui Hmmm a bit, yeah.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1mo
@MariahL @cachondo @Tamasg @jscholes @ZBennoui I hear where you're coming from. It needs to be...rounder or wider or more relaxed or something. But I'm at the point where every change I make causes it to sound worse or introduces strange new issues.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1mo
@Tamasg @cachondo @jscholes @ZBennoui Aww. This is a super hard problem. Especially because nobody can articulate what they want, we all just know when it's wrong.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1mo
@Tamasg @cachondo @ZBennoui Yup. And I find I have to swap back to eloquence frequently, or else I lose my way completely, and everything starts to sound fine to me.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1mo
@jscholes @cachondo @Tamasg @ZBennoui So some of the issues you're identifying have to do with the espeak phonemizer, and the way phonemes are tuned. We have a lot of work to do there, too. But I feel like if we can get the sound of the voice correct, that will be a lot easier. For example, compare speech player and eloquence saying "eeeeeeeeee". Side by side, it's clear something is still not right with speech player. It needs to be...rounder and brighter? And...those are not useful terms because I'm still struggling to define exactly what I mean by them haha.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1mo
@ZBennoui @Tamasg Can you articulate what you dislike about it? I'm still not a fan the way I am of eloquence, but it's getting harder and harder for me to define why, so we can actually fix it.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1mo
@FastSM Loving the client. Works perfectly with iceshrimp as my server.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1mo
@we_are_spc @jaybird110127 It's still in beta.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
1mo
On one hand, I absolutely love and adore pull requests. On the other hand, they make me realize just how bad I am at pretending to be a developer. Anyway, if you use 2026, here's another release of unspoken-ng that fixes Firefox errors while also making everything better because I am apparently incapable of correctly thinking through the effect of threads. You should upgrade ASAP if you use this addon: github.com/fastfinge/unspoken-ng/releases/tag/v1.0.4