User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
2h
@jscholes @cachondo @Tamasg @ZBennoui The other thing that makes this super, super hard is that there are like nine different systems, and all of them need tuning. And it's impossible to ask people who haven't spent pretty much four days straight thinking exclusively about this for feedback on a particular system, because they all work together to make up the voice, and you can't know where any given issue comes from. There's the rules for going from text to IPA phonemes. Then the rules for determining the way IPA phonemes are actually voiced and fit together. And then there's the intonation table. And then there's the two systems that actually make the sound. Right now I'm mostly looking at the system that actually makes the sound, IE when you do "aaaaaaaaaaaaa" or "eeeeeeeeeeeeeeeee", because that's still not right. But because it's an entire voice, it's even hard for me to separate my own perceptions and fix anything.

The important thing to remember is that eloquence began development in 1982, by a team of about a dozen researchers. It wasn't in the state we know it until around 2002. We have existing research to build on, but no funding and fewer people, and no PHD level speech researchers. So actually doing this, even with the help of AI, is a 20-30 year project before we get close to eloquence levels. Because we have something that "works" and improves step by step, it's easy to lose sight of the size of the problem we're taking on, because it feels like we should be able to get there in a month or two. But that's not realistic.