@polx Maybe, but probably not. Doing that would result in a lot of wasted resources generating text I'm never going to listen to. Think about the average user interface: dozens of menus, and toolbars, and ads, and comments, and so on. Plus, the text changes constantly, on even simple websites. That's not even taking into account websites that just scroll constantly. It might be possible to create some kind of algorithm to predict the most likely text I'll want next, but now we've just added another AI on top of the first AI.
I think a better solution might be to make the text to speech system run on different hardware from the computer itself. This is, in fact, how text to speech was done in the past, before computers had multi-channel soundcards. This has a few advantages. First, even if the computer itself is busy, the speech never crashes or falls behind. Second, if the computer crashes, it could be possible to actually read out the last error encountered. Third, specialized devices could be perhaps more power and CPU efficient.
The reason text to speech systems became software, instead of hardware, is largely because of cost. It's much cheaper to just download and install a program than it is to purchase another device. Also, it means you don't have to carry around another dongle and plug it into the computer.