User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
Admin
completely blind computer geek, lover of science fiction and fantasy (especially LitRPG). I work in accessibility, but my opinions are my own, not that of my employer. Fandoms: Harry Potter, Discworld, My Little Pony: Friendship is Magic, Buffy, Dead Like Me, Glee, and I'll read fanfic of pretty much anything that crosses over with one of those.
keyoxide: aspe:keyoxide.org:PFAQDLXSBNO7MZRNPUMWWKQ7TQ
Location
Ottawa
Birthday
1987-12-20
Pronouns
he/him (EN)
xmpp fastfinge@im.interfree.ca
keyoxide aspe:keyoxide.org:PFAQDLXSBNO7MZRNPUMWWKQ7TQ
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5mo
transphobia @andrew @bermudianbrit The way I first noticed it is when Voldemort started getting marked as spelled incorrectly using Microsoft Edge, when it never had been before. I then discovered that it couldn't suggest Dumbledore as a suggestion, no matter how close to the correct spelling I got. Then I noticed that modern terms like Kubernetes were also being considered misspellings. When I disabled AdGuard DNS and rebooted, the problems went away.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5mo
@Genstar @IceWolf This, too. It also feels like...one step away from cryptocurrency mining. Remember when captchas were just to tell humans and computers apart? Then they were to help digitize public domain books for the general good. Then they were to help digitize books for Google's ebook library. Now they're to help train Google's AI to recognize photos! How long until someone goes "Hey, as long as we're making someone's computer do work to prove it's a real person...hmmm...why don't we have them mine some bitcoin for us? For charity! Well, at first..."
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5mo
@ktneely Oh, we have a solution for this. Just sign in with your Microsoft(tm) Account. Then we'll store all of your custom dictionary entries on the cloud so they'll always be available on all of your devices! You'll also get more contextually aware and relevant advertisements from our marketing partners based on the terms you use most frequently!
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5mo
@ktneely They're probably using en_US.dic. The problem is that new words never get added to it, and the algorithm they're using to detect spelling errors hasn't been updated since forever. Probably Levenshtein distance or something. Why bother doing any better for offline users when the online ones can just use an LLM? "Levenshtein" is a good example, actually. It's a spelling error if I'm offline, but not if I'm online. I assume it has some kind of online names database or something. That shouldn't need to be online, but they have no incentive to make it work offline.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5mo
@IceWolf @foxbutt And relays are weird because they're trying to solve the discovery problem. IE I want to make sure I see all posts with the blind hashtag. But in a distributed system, there's really no good way to do that without giving everyone everything.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5mo
@IceWolf @foxbutt Nah, it depends on how your implementation is configured. Some server owners turn off backfilling because they want to save disc space and don't care about search. And some server owners configure things so that there server will only show your server a certain subset of posts from a user, rather than all of them when it asks. And then authorized fetch and how it interacts with blocking and post privacy adds another layer of complexity.

And, of course, none of this stuff is (or can) be enforced by any kind of technical server. Someone could easily write/patch an "evil mastodon" to suck up as many posts as it can, while fooling the other server into thinking the requests are legit. Kind of like how some torrent clients are written to upload as little as possible.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5mo
@IceWolf @foxbutt No, all you have to do is join a relay server. Then it will send you every post from every other server that also joined that relay server. Almost no relay servers currently require approval to join. Also, because of how threading works, once you become aware of a user (perhaps because they boosted you or whatever), most fediverse implementations will happily let you "backfill": IE allow your server to download every public post of that user so you can view it locally.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5mo
@IceWolf @foxbutt I dunno. I have no idea who runs tech.lgb or mastodon.online, two servers picked completely at random from the 15,603 instances currently federating with me. The fact that I'm a good person doesn't mean that everyone else on the network is. And let's be real: if I was going hungry and had little or no access to healthcare, and open AI said "Hey, buddy, we'll give you a million dollars for that post archive!" how many people would choose to be sick and homeless rather than make that deal?
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5mo
@IceWolf @foxbutt Currently, my small single-user instance is receiving over fifty thousand posts a day. The database currently contains slightly over twenty million posts. And I'm just one dude. If an organization wanted visibility into the entire fediverse, I'm sure they could do a lot better!
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5mo
@IceWolf @foxbutt There are, of course, ways to "fix" this. We could require relay operators to request photo ID of every single server operator who joins the relay. And we could only federate with other servers that are willing to provide their ID's to us. And we'll only allow API access for approved organizations. The cure sounds a lot worse than the problem, to me!
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5mo
@IceWolf @foxbutt Everyone blocks the people who admit they're doing that. Do you really think Google and Metta aren't running some random instance under a quirky name? Do you really think every single server admin will refuse money for access to their timelines? The thing about a distributed system is...it's distributed. Once dozens and dozens of servers have it, you really don't have any hope of controlling where it goes or what happens to it.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5mo
@IceWolf @foxbutt Doesn't matter. Does anyone reading this thread use Chrome? Congrats! We're both in the AI training material.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5mo
@IceWolf @foxbutt Right, and once again, I think DRM is an instructive example, here. Companies can ask people not to pirate there stuff, and usually, most of the people will request that most of the time. But it only takes one! And every single method of DRM that attempts to block piracy makes everything both less accessible and worse for everyone.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5mo
@IceWolf @foxbutt Also, if you're using Google Chrome, you're already feeding everyone's everything into an LLM without there consent.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5mo
@IceWolf @foxbutt I'm not at all. But I'm absolutely certain some other company is! All they have to do is spin up an instance and join the major relays. Or just buy access to the local timeline on mastodon.social. My point is that in order to prevent these things, the fediverse would have to block all automated access of any kind: screen readers, third party clients, everything.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5mo
@foxbutt @IceWolf Right, because an open internet means that you don't really get to decide what accesses your content: my screen reader, someone on an ebook reader from eight years ago, a smart TV, or a fridge. Abusive bots are a problem, and need to be stopped. But that's to save the server resources, rather than to limit content use. Because if you limit AI ability to scrape your content, you will always lock me out. The entire fediverse is wonderful for AI! It's got an open AI that my specialized accessibility client can access, and so can any AI training tool. To block the AI, you'd have to take away the API, or put up something that would also block other human and automated uses of it as well.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5mo
@IceWolf And just like how there is no DRM scheme that can stop someone from copying digital bits, and no encryption scheme with a backdoor that only the good guys can use, there is no way to block AI scrapers, but not screen readers and other accessibility programs. Because they're both just software.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5mo
@IceWolf Twitter and Reddit are more good examples. They decided to shut down there API's, in order to sell content to the AI machine. So most blind people are on Mastodon now. But if the fediverse decides they want to stop all AI training as well, a side-effect of blocking that will be shutting out people using assistive technology. Just like Reddit and Twitter did. Only for the opposite reason.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5mo
@IceWolf Right, but as long as capitalism exists, any limits we set to stop one will always be abused to stop the other. As someone with accessibility needs, the only way for me to have a life and job is for the internet to stay completely open. Not just "open to whatever types of use authors and companies feel like granting", because that always excludes accessibility.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
5mo
@foxbutt @IceWolf In general, you can start by tarpitting. And you can also rate-limit by geographic areas. For example, 99 percent of my visitors are from the US and Canada. But obviously, I don't want to block the entire rest of the world. But I do have all other countries on a much, much quicker rate limit. There are ways around this if you care. But most people don't; accessibility is a sacrifice they are willing to make on my behalf.

The other problem, of course, is all of these solutions will block legitimate scripts. For example, The Internet Archive, scripts that mirror resources on physical media to ship to underdevelopped countries, and that thing that I use to download multi-page articles for offline reading on my phone because the subway doesn't have internet access.