User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4mo
Tagging @Tamasg odorediamanka600-source/FYLs-G2P: A lightweight hybrid G2P engine with less than 1.8M parameters and can be deployed on any devices (almost) github.com/odorediamanka600-source/FYLs-G2P
2
0
1
0
User avatar
Tamas G @Tamasg@mindly.social
4mo
@fastfinge oooh this looks really neat. Only English - other languages would still need eSpeak or another solution. Also, no sentence-level prosody - FYLs-G2P doesn't output intonation/prosody markers. Hmm.
1
0
1
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4mo
@Tamasg Yeah, not quite what we need. But a step in the right direction...
2
0
0
0
User avatar
Tamas G @Tamasg@mindly.social
4mo
@fastfinge also bundling Onnxruntime (but maybe easier as a DLL module not full Python bloat?) for older NVDA. This would be smaller than your Gruut experiment: I'm looking at about 11 MB for the lightweight model, and then another 15-20 MB for onnxruntime DLLS, then anothr 20 MB for numpy. Which it also requires. So in total the add-on would become 80 MB.
We could do a mechanism where we use this G2P for English only but other G2Ps or Espeak for foreign languages.
1
1
0
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4mo
@Tamasg The lack of prosody markers is a blocker, though.
2
0
0
0
User avatar
Tamas G @Tamasg@mindly.social
4mo
@fastfinge OMG, the lag on this is horrendous when reading longer-chunked sentences, wow! But it definitely works!
1
0
1
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4mo
@Tamasg Also some phoneme tuning: thread and threat sound almost identical.
1
0
0
0
User avatar
Tamas G @Tamasg@mindly.social
4mo
@fastfinge and it has zero tuning for single-letter names, which is great. I realized this really quickly, "G" is pronounced as "gwee" and "h" as "hu" haha! So yeah, we'd have to override every single English language letter as a normalization rule, yikes.
1
0
1
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4mo
@Tamasg I suspect you're going to wind up training your own g2p model. Eloquence can already output its phonemes, so you'd just have to write a script to convert from eloquence to IPA, and then you could just make a bunch of training data.
1
0
0
0
User avatar
Tamas G @Tamasg@mindly.social
4mo
@fastfinge yeah but, like, lag. I'm noticing it is really really bad when doing full sentences, because it all gets pushed to the G2P. i actually don't think these are viable for relying on Onnxruntime too much and being slower on even worse CPUs - what if someone runs it on something as low-end as an Intel Atom? It's going to take 10 seconds to process the ONNX conversion part. Hmm.
2
0
0
0
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4mo
@Tamasg It's possible to get Onnxruntime to be snappy with a correctly optimized model, though. Blastbay TTS uses it.
0
0
0
0