@fastfinge I have been developing a neural TTS system, focused on screen reading for many months, which offers instant responsiveness, but maintains good synthesis quality at the same time. And, BTW, it is not recommended at all to use espeak as a phonemizer backend as breaks the text embeddings during model training, especially if we use linguistic information. And, please consider to avoid overeading NVDA's python environment in your add-ons.
@rmcpantoja The issue with not using Espeak is that it makes it impossible to have user dictionaries. When we use a neural network, linguistic rules are no longer deterministic. So it might say a word correctly with one voice, at one time, but not with another voice, or at another time. This makes it impossible for us to correct mispronounced words in a reliable way.