new, experimental build of SpeechPlayer. I would like those who are brave to test, for a new, more Eloquence-like sound. Shoutout does go out to @fastfinge for helping to get this idea going and colaborating on the repository. eurpod.com/synths/nvSpeechPlayer-experimental.nvda-addon The big switch is that it is no longer a sawtooth wave. Instead, it now uses asymmetric cosine glottal-flow pulse (a pitch-synchronous "glottal pulse train"). So, glottal flow pulses, not continuous oscillator shapes like triangle/saw/square. This has allowed us to achieve a much smoother voice, with clearer consonants but the familiarity of the voice people know.
@ZBennoui@Tamasg Can you articulate what you dislike about it? I'm still not a fan the way I am of eloquence, but it's getting harder and harder for me to define why, so we can actually fix it.
@fastfinge@ZBennoui@Tamasg I wish I could say why, too. There's something ... forceful about the way it says everything, a harsher sound than eloquence. But my brain is just not coming up with the right technical terms.
@cachondo@fastfinge@ZBennoui@Tamasg I struggled to understand it. Relatively sibilant, bassy, nasal, and forceful as you said. "Folks" sounded more like "follks", and "here" was cut off.
@jscholes@cachondo@Tamasg@ZBennoui So some of the issues you're identifying have to do with the espeak phonemizer, and the way phonemes are tuned. We have a lot of work to do there, too. But I feel like if we can get the sound of the voice correct, that will be a lot easier. For example, compare speech player and eloquence saying "eeeeeeeeee". Side by side, it's clear something is still not right with speech player. It needs to be...rounder and brighter? And...those are not useful terms because I'm still struggling to define exactly what I mean by them haha.
@fastfinge@jscholes@cachondo@ZBennoui I think phonemizer yes, but also language-specific phonetic rules that I and others tune, like how we got the word "start" to no longer sound disjointed and like "st-ah-rt" as it was in earlier builds. does nobody seriously give me any credit for improvements and people only want to complain on how it's drifting and sounding shittier? Honestly I get more of that feedback and each time it makes me want to just give up on this entire thing fully. If you really hate it, then, fix it yourself, don't put that honus all on me. Going away for the rest of the day. I'm super sad.
@Tamasg@fastfinge@cachondo@ZBennoui The thread specifically asked people to try and articulate how they felt about the voice, so I did. You are publicly burning yourself out on this and other projects in a way that constantly makes me want to tell you to get some sleep and look after yourself, so maybe temporarily stepping away would be a good thing.
@jscholes@cachondo@Tamasg@ZBennoui The other thing that makes this super, super hard is that there are like nine different systems, and all of them need tuning. And it's impossible to ask people who haven't spent pretty much four days straight thinking exclusively about this for feedback on a particular system, because they all work together to make up the voice, and you can't know where any given issue comes from. There's the rules for going from text to IPA phonemes. Then the rules for determining the way IPA phonemes are actually voiced and fit together. And then there's the intonation table. And then there's the two systems that actually make the sound. Right now I'm mostly looking at the system that actually makes the sound, IE when you do "aaaaaaaaaaaaa" or "eeeeeeeeeeeeeeeee", because that's still not right. But because it's an entire voice, it's even hard for me to separate my own perceptions and fix anything.
The important thing to remember is that eloquence began development in 1982, by a team of about a dozen researchers. It wasn't in the state we know it until around 2002. We have existing research to build on, but no funding and fewer people, and no PHD level speech researchers. So actually doing this, even with the help of AI, is a 20-30 year project before we get close to eloquence levels. Because we have something that "works" and improves step by step, it's easy to lose sight of the size of the problem we're taking on, because it feels like we should be able to get there in a month or two. But that's not realistic.
@cachondo@Tamasg@jscholes@ZBennoui We also lack tools as blind people. For example, I wish I could visually examine the shape and spectrogram of sounds. A useful step here would be to get eloquence to generate a pure, single-note open "aaa" or "eee" tone, and understand the shape of the resulting sound wave. But there are just no accessible tools to do that. So instead we have to go by listening and guess work.