4mo
Tagging @Tamasg odorediamanka600-source/FYLs-G2P: A lightweight hybrid G2P engine with less than 1.8M parameters and can be deployed on any devices (almost) github.com/odorediamanka600-source/FYLs-G2P
2
0
1
0
User avatar
Tamas G @Tamasg@mindly.social
4mo
@fastfinge oooh this looks really neat. Only English - other languages would still need eSpeak or another solution. Also, no sentence-level prosody - FYLs-G2P doesn't output intonation/prosody markers. Hmm.
1
0
1
0
4mo
@Tamasg Yeah, not quite what we need. But a step in the right direction...
2
0
0
0
User avatar
Tamas G @Tamasg@mindly.social
4mo
@fastfinge also bundling Onnxruntime (but maybe easier as a DLL module not full Python bloat?) for older NVDA. This would be smaller than your Gruut experiment: I'm looking at about 11 MB for the lightweight model, and then another 15-20 MB for onnxruntime DLLS, then anothr 20 MB for numpy. Which it also requires. So in total the add-on would become 80 MB.
We could do a mechanism where we use this G2P for English only but other G2Ps or Espeak for foreign languages.
1
1
0
0
4mo
@Tamasg The lack of prosody markers is a blocker, though.
2
0
0
0
User avatar
Tamas G @Tamasg@mindly.social
4mo
@fastfinge OMG, the lag on this is horrendous when reading longer-chunked sentences, wow! But it definitely works!
1
0
1
0
4mo
@Tamasg Also some phoneme tuning: thread and threat sound almost identical.
1
0
0
0
User avatar
Tamas G @Tamasg@mindly.social
4mo
@fastfinge and it has zero tuning for single-letter names, which is great. I realized this really quickly, "G" is pronounced as "gwee" and "h" as "hu" haha! So yeah, we'd have to override every single English language letter as a normalization rule, yikes.
1
0
1
0
4mo
@Tamasg I suspect you're going to wind up training your own g2p model. Eloquence can already output its phonemes, so you'd just have to write a script to convert from eloquence to IPA, and then you could just make a bunch of training data.
1
0
0
0
User avatar
Tamas G @Tamasg@mindly.social
4mo
@fastfinge yeah but, like, lag. I'm noticing it is really really bad when doing full sentences, because it all gets pushed to the G2P. i actually don't think these are viable for relying on Onnxruntime too much and being slower on even worse CPUs - what if someone runs it on something as low-end as an Intel Atom? It's going to take 10 seconds to process the ONNX conversion part. Hmm.
2
0
0
0
4mo
@Tamasg I've also been meaning to look into this. They advertise streaming TTS lag at faster than realtime on CPU, with 100 MS lag or less: github.com/kyutai-labs/pocket-tts?tab=readme-ov-file
1
0
0
0
User avatar
Tamas G @Tamasg@mindly.social
4mo
@fastfinge I guess not worth it? Some folks made an NVDA add-on already at the AG forum topic. forum.audiogames.net/topic/58526/pocket-tts/
1
0
0
0
4mo
@Tamasg Right, but they just used codex and probably included all of torch. There are onnxruntime versions available, and compiled versions in rust.
1
0
0
0
User avatar
Tamas G @Tamasg@mindly.social
4mo
@fastfinge ah darn. not multilingual kind of stings. I think it's why I built Speechbox. Hungary is as much at of an "only comercial TTS is available that sounds horrible with some words" problem as we are with the antiquated eloquence problem for US English. People there only have comercial TTS options for their screen reader, and or Espeak. JAWS switched to using Vocalizer Hungarian voice, over the homegrown Profivox voice. Americans would cry if they had Hungary's TTS situation lol.
1
0
1
0
4mo
@Tamasg "Well they should just learn English. Idiots. They should stop being blind while they're at it." -- probably some big tech CEO, somewhere
3
0
0
0
User avatar
Tamas G @Tamasg@mindly.social
4mo
@fastfinge But I mean look at this bad list. the lexicon has decent coverage of common words, but the neural model for OOV words is really struggling with compound words and tech terms. And even some in-lexicon words have weird outputs.
• equals → ˈikwᵊlz (OOV, wrong - sounds like "eekwulz")
• dropdown → truncated W at end
• firefox → truncated
• bluetooth → truncated
• youtube → truncated (jˈutˌu)
• github → truncated (ɡˈɪt)
• stackoverflow → truncated
• localhost → wrong
• ctrl → garbage
• alt → garbage (ˈI = "eye"??)
• spacebar → wrong (spˈAsbˌɑɹ = "spaysbahr")
• wifi → wrong stress pattern
• combobox → OOV
• focusable → OOV, wrong
This is a research-quality G2P, not production-quality for screen reader use is sadly my final conclusion.
2
0
0
0
4mo
@Tamasg Also, another stress test I've been using on text to speech systems lately is the name of my friend "Hrvoje". It's croatian, and pronounced "her voy yay". Every AI text to speech system does something new and awful with it. So does every klatt system haha.
1
0
0
0
User avatar
Tamas G @Tamasg@mindly.social
4mo
@fastfinge oh my gosh I know that person. Cattic I believe is last name. Yes. I'm very glad you two are friends :) Yep. met in high school I think or something like that, back in the good old days of Live Messenger. Haha.
1
0
1
0
4mo
@Tamasg Yeah, that's the guy LOL. We're re-enforcing the stereotype that all blind people know each other!
2
0
0
0
User avatar
Tamas G @Tamasg@mindly.social
4mo
@fastfinge hahahahahaaha well I'm always more surprised when it's someone international, because honestly the blind community operates in little cliques too especially in ones that don't cross international boundaries as far. Argueably the web has improved this a lot in some groups but not all.
1
0
1
0
4mo
@Tamasg Yeah true. I remember one time I was on the train, and a random sighted person said to me "Hey! There's another blind person just down the car! Don't you two know each other? Why aren't you sitting together? Here, I'll take you to him." I was like "No, I'm traveling alone. We're not together." Then the other blind person overheard the conversation, and it turned out we'd known each other for years. So we sat together and chatted for the rest of the train ride. I was so tempted to pretend I didn't know him at all, just so I didn't validate this random sighted person's stereotypes.
1
0
0
0

User avatar
Tamas G @Tamasg@mindly.social
4mo
@fastfinge lol the irony in that story isn't lost at all! Hahahaa too funny. I had similar experiences when at Microsoft because I would know people who were blind working there, but then at times I'd go out to a bar or somewhere and same thing. Sighted person asks if I know the blind man sitting across the bar, I'm like "probably not, ha, ha!" but start chatting and it's a one of my coworkers who's blind and happened to be in the same bar. So yeah, that feeling's real :D
1
0
0
0
@Tamasg The worst thing was that we both got off the train at the same stop. We were going different places, at least. But still LOL
1
0
0
0
User avatar
Martin @mcourcel@allovertheplace.ca
4mo
@fastfinge @Tamasg Yup, have so many similar stories. Lolol!
1
0
0
0
@mcourcel @Tamasg We both lived in Toronto for years. So you're lucky you weren't the star of this one! LOL
1
0
0
0
User avatar
Martin @mcourcel@allovertheplace.ca
4mo
@fastfinge @Tamasg Hehehe tru dat! There's a blind dude that supposedly looks like me. I was walking close to the CNIB once and this guy kept wanting me to go into his house to tune his piano. I finally clued in and told him that was my brother. Turns out I know him, so I called him to tell him that we were mistaken as each other again.
0
0
1
0