There's a new product that has been gaining some buzz in the blind community, a Windows app called Guide that uses AI to perform tasks on your computer. It's pitched as a way to get around web accessibility problems in particular. I won't link to the thing itself, because I don't want to give it that validation, but I'll link to a previous discussion thread about it: fed.interfree.ca/notes/a5wf4yss764nf6h7
I've spent some time taking this app apart. The level of shoddy work here is deeply disgusting. 1/?
First, it's an Electron+Python monstrosity. Specifically, the Python backend runs as a web server on the local machine, and the Electron frontend connects to that local web server. Along with the size of Electron itself, the frontend app is about 27 MB, mostly a node_modules tree with no hint of tree-shaking / dead code elimination. The front-end JavaScript code is not minified at all, so once you extract the .asar file, it's easy to look at it. 2/?
The frontend being fully unobfuscated would of course be a good thing if this were supposed to be open source, but it's not. And that frontend seems to be the only part of the program that validates that you have the license/subscription. That's just begging to be cracked. 3/?
But now let's talk about the Python backend. The first obvious question, of course, is what AI model it's using, and whether the inference is done locally or remotely. It's using Claude 3.7 Sonnet with its computer use feature. But here's the really crappy part: the connection to Claude, and to other services like Azure Speech and ElevenLabs (yes, both), is happening on the user's machine, using API keys embedded inside the application. 4/?
To spell it out, the problem with directly connecting to third-party services using API keys inside an application running on a user's machine is that you're just begging to have someone steal those keys and run up your bills. Without having your own server in the mix, there's no hope of reining in that usage of third-party services and tying it to some kind of authorization system. They do have an API server (on Azure) for the license/subscription, but as I said, that's easily circumvented. 5/?
The Python backend is packaged using pyinstaller. There's 30 MB (compressed) of Python bytecode in the executable, and then there's also an "_internal" directory with tons of dependencies, adding up to about 200 MB (uncompressed), again with no apparent attempt at eliminating dead code in the package. I readily admit that I'm perhaps overly obsessed with trying to make non-bloated software, but come on. 6/?
It wouldn't be right for me to knock the product for the bloat alone. But taken together with the direct use of third-party services in the app on the user's machine, and the actual functionality problems detailed in the thread I linked to, the whole thing smells of something hastily cobbled together to catch a ride on the AI hype train. If this is the accelerated future of software development that businesses want, then as I said, it's deeply disgusting, and kind of scary. 7/7
Perhaps I need to more explicitly call out what is actually the scariest part here: if you use this product, you're letting an application take control of your computer, using the output of a large language model as input. I know better than to describe an LLM as "just" a next-word predictor, because we've all seen how surprisingly powerful that can be. But still, it's all too common for LLMs to output things that don't make sense, especially when venturing outside their training.
And yes, some of my early work as a young programmer could have been skewered like this. Admittedly I'm saying this from the safe distance of 20 years or so, but honestly, it should have been. College certainly didn't properly slaughter my ego as it should have, as @bcantrill discussed in his "Coming of Age" talk (www.youtube.com/watch?v=VzdVSMRu16g). As long as we don't sink to personal attacks and focus on substantive problems, I think healthy public criticism of publicly released work is OK.
@matt Having exchanged emails with Andrew, GUide’s developer, I really don’t think this is a cash grab. He strikes me as well intentioned. I was unaware of the coding issues, or that it runs an unsecured web server on the users machine. From what you can see, is there anything to stop another rogue app from connecting to the web server and telling Claude to execute stuff on the user’s machine? I was only really evaluating this from a functionality perspective. Though I was somewhat surprised that some of the changes I proposed in my email weren’t things he’d thought about previously. This feels to me like good ideas that didn’t get enough time in the oven, combined with lack of experience and good intentions. The telling thing, to me, will be what happens over the coming weeks. Will this go back in the oven for more cooking? Will we see the changes in safety and security that are needed?