Tagging @Tamasg odorediamanka600-source/FYLs-G2P: A lightweight hybrid G2P engine with less than 1.8M parameters and can be deployed on any devices (almost) github.com/odorediamanka600-source/FYLs-G2P
@fastfinge oooh this looks really neat. Only English - other languages would still need eSpeak or another solution. Also, no sentence-level prosody - FYLs-G2P doesn't output intonation/prosody markers. Hmm.
@fastfinge also bundling Onnxruntime (but maybe easier as a DLL module not full Python bloat?) for older NVDA. This would be smaller than your Gruut experiment: I'm looking at about 11 MB for the lightweight model, and then another 15-20 MB for onnxruntime DLLS, then anothr 20 MB for numpy. Which it also requires. So in total the add-on would become 80 MB. We could do a mechanism where we use this G2P for English only but other G2Ps or Espeak for foreign languages.
@fastfinge and it has zero tuning for single-letter names, which is great. I realized this really quickly, "G" is pronounced as "gwee" and "h" as "hu" haha! So yeah, we'd have to override every single English language letter as a normalization rule, yikes.
@Tamasg I suspect you're going to wind up training your own g2p model. Eloquence can already output its phonemes, so you'd just have to write a script to convert from eloquence to IPA, and then you could just make a bunch of training data.
@fastfinge yeah but, like, lag. I'm noticing it is really really bad when doing full sentences, because it all gets pushed to the G2P. i actually don't think these are viable for relying on Onnxruntime too much and being slower on even worse CPUs - what if someone runs it on something as low-end as an Intel Atom? It's going to take 10 seconds to process the ONNX conversion part. Hmm.