User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
Admin
completely blind computer geek, lover of science fiction and fantasy (especially LitRPG). I work in accessibility, but my opinions are my own, not that of my employer. Fandoms: Harry Potter, Discworld, My Little Pony: Friendship is Magic, Buffy, Dead Like Me, Glee, and I'll read fanfic of pretty much anything that crosses over with one of those.
keyoxide: aspe:keyoxide.org:PFAQDLXSBNO7MZRNPUMWWKQ7TQ
Location
Ottawa
Birthday
1987-12-20
Pronouns
he/him (EN)
matrix @fastfinge:interfree.ca
keyoxide aspe:keyoxide.org:PFAQDLXSBNO7MZRNPUMWWKQ7TQ
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4d
@sinmisterios @the5thColumnist @RachelThornSub Another thing you could do is just copy paste an explanation of your issue into the alt text. Odds are someone else will write it for you. Or a blind person who comes across the image will ask. Accessibility for people with disabilities shouldn't mean silencing the voices of other people with disabilities. You could also create an image-only account, that says write in the profile you can't write alt-text. That way people who don't ever want to have images we can't understand in our timelines could follow your main account, and ignore your image only account.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4d
@disorderlyf That's why I went to the example for shock value. I was feeling like CatHat was just looking for a fight, so came out swinging.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4d
@disorderlyf Yeah, I was afraid that's what had happened. I made an intentionally awful example to make a point. But it's possible the example was so awful that the point was a bit lost.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4d
@pvd1313 @the5thColumnist @RachelThornSub TalkBack is what fully blind folks use, and it works well. But it needs training from a specialist; nobody can just learn it completely by themselves. However, dictation on the Nokias should work for making calls and answering messages. I really don't know how accessible telegram is with dictation or TalkBack these days, though. Unfortunately I use IOS, not Android. @dhamlinmusic do you know anything?
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4d
@disorderlyf @CStamp @CatHat @RachelThornSub Yes, exactly. I've often wondered about some kind of API to collect the hashes of popular memes that get reposted endlessly, and just fill in the alt-text for them automatically. Memes are one of the few places where the alt-text can just be reused, in the same way the image is just being reused.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4d
@pvd1313 @the5thColumnist @RachelThornSub What kind of phone does she have?
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4d
@b Yes, the blind community is quite strong, and almost all of us moved over from Twitter when they discontinued the API and broke accessibility. It wasn't like in sighted communities where some percent is on Mastodon, and some is on Twitter. Nearly a hundred percent of the blindness community who was active on twitter was forced to move.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4d
@disorderlyf @CStamp @CatHat @RachelThornSub I was intentionally using that as a bad example. As for the shirt: no. Why does this matter in a stock photo from a news article? It has no relevance to the subject of the article (her getting banned from competing), so probably shouldn't be there. In your example, both the image and caption matter. But you didn't describe Marge Simpson, and that was the correct description. What she looks like doesn't matter to what you're communicating.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4d
@pvd1313 @the5thColumnist @RachelThornSub Depends. I like to start with deepseek-ocr if I have any reason to suspect the image is text. If it is, I can stop there. Otherwise, I move up to something like microsoft/phi-4-multimodal-instruct. If I still care and didn't get enough, llama-3.2-90b-vision-instruct will do the trick for most things. Only if it's charts and graphs that I care about do I need to use either the Google or OpenAI models. If it's pornographic, I have to use Grok, because XAI is completely and utterly unhinged and won't refuse anything no matter what. I use everything either locally where possible, or via the openrouter.ai API. That way it's more private, and I'm only paying for what I use. I usually use the tool: github.com/SigmaNight/basiliskLLM

It supports ollama, openrouter, and any openAI compatible endpoint, and integrates perfectly with the NVDA screen reader.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4d
@pvd1313 @the5thColumnist @RachelThornSub This helps a lot, yes. Though if I know you just AI generated it, I'm probably not even going to keep reading. My AI is almost certainly better than yours, because I use it constantly and have customized the settings to get it to be as accurate as these things are capable of being.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4d
@CatHat @CStamp @RachelThornSub No, it isn't good at judging, because (hopefully) it's not the author or poster. The thing AI is most useful for is when the information that is relevant to me and the information that is relevant to the author are mismatched. For example, if I have a burning desire to know what color Penny Oleksiak's hair is, I can ask AI. It wasn't important to the article I was reading, so the author, correctly, did not include it. The AI says its auburn. This is now a thing I can no myself, whereas I would previously have to bother a sighted person about something trivial like this. And I'm aware that There's about a 15 percent chance the AI is telling lies, again. If it is, no harm done, really. Because the things I tend to ask AI about images I encounter aren't important enough to bother a sighted person about, and I would previously just be stuck not knowing at all. Now I can know with about an 85 percent certainty.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4d
@CatHat @CStamp @RachelThornSub How else should I frame it? Describing an image for someone else is an inherently social act. It’s akin to language translation in some ways: only the translator is qualified to judge the accuracy of the effort, and may be the only one with the context to do it correctly. And there are some things that can easily be described in one language, and only with difficulty in another. There are also expressions and idioms that might not carry over at all. But in other ways it’s harder, because there is a data mismatch: an image contains a lot more data than a stream of words does. So the translator also has to decide what is important, while remembering that what they include says as much as what they don’t. See, for example, the argument about mentioning race in alt text. If you don’t mention it, the reader is going to assume the person in the image is white, because that’s most readers default. But if it’s, say, a mug shot, if you do include it you’re now making an entirely different statement. That’s why it’s on you, as the author, to decide what impression your blind reader should be left with, and craft your alt text to that effect.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4d
@CStamp @CatHat @RachelThornSub Honestly though, as someone born blind, it took me years to learn just how many of the images sighted people attach to things are "because the layout looks better with an image there" or "we need a thumbnail for the thing" or "everything else like this has an image so they all have to have one" or "I can post images so damn it I'm going to" or "boobs or blood so you'll click". From what I can tell, on most websites, only maybe ten percent of the images on a page need alt-text longer than a few words, because the person who added them is communicating nothing useful at all. On Mastodon, obviously, that's different, because you intentionally posted the image. But even here, ninety percent of images are posted to communicate "nature is pretty" or "my cat is cute". It's the other ten percent that need the long and thoughtful alt text.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4d
@CatHat @CStamp @RachelThornSub But the facial expressions are not always important. In my Penny Oleksiak example, if this was a photograph of her in tears, gutted that she was banned, her expression would matter. But if it was just a stock photo included because "the layout requires a picture here", it's not, and the alt text should be much shorter.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4d
@CatHat @CStamp @RachelThornSub And who decides what information is important? That's my point. The author knows why they posted the image, so they know what the "important information" they want me to have is. If the alt text just describes every single thing, it's now four paragraphs long, and whatever was important to the author probably got lost in the noise.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4d
@CatHat @CStamp @RachelThornSub I was born completely blind, and I still am. So image descriptions are the only way I have of making sense of pictures.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4d
@the5thColumnist @RachelThornSub My suggestion would be to keep it simple. If the reason you posted the photo was because it was a pretty flower, well...that's fine for the alt-text. No matter how many words you use, you might not be able to communicate the exact feeling of beauty you experienced. If you could, you'd be a writer, not a photographer. Ask yourself why you posted, and what you want someone to take away from it. If you want them to notice the colour, or the size, or whatever, those are what goes in the alt text.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4d
@RachelThornSub @milkman76 And this is an entirely false argument. An AI specialized in describing images can run on a consumer PC, these days. It's doing zero of the things you're talking about. Apple has done image descriptions locally on its phones for five years now. If you're just tossing images at Chat GPT, you're doing it wrong. The same way as if you gave chat GPT a CSV file and told it to sort it for you. There are way, way better ways of doing that, that get you the result you want quicker, without the resource waste.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4d
@CatHat @CStamp @RachelThornSub This is neither possible or desirable. I just read an article on Penny Oleksiak and how she was banned from competition for two years. It had a picture of her; should it tell me what colour her shirt was? Or if she was wearing a hat? Or if her shoes were visible in the photo? Maybe sighted people could estimate her breast size, too; where should we stop providing information that a sighted person could perceive? "Only the important information," you say? If I'm sexually attracted to Canadian Olympic swimmers, her physical attributes might be what I consider important information. This is a silly standard, and holding writers of alt-text to it is why so many people think they're bad at writing alt-text, and are scared to try.

The questions that matter are: why did the author include this image? What did they want to communicate to the reader by including it? What does the author think is important for me to know about it? The author can't know me and my interests. But they do know themselves, and they know why they included that image in that place in that article.
User avatar
🇨🇦Samuel Proulx🇨🇦 @fastfinge@interfree.ca
4d
@RachelThornSub So as an actual blind user who uses AI regularly...no, not really. If you include AI generated alt-text, the odds are you're not checking it for accuracy. But I might not know that, so I assume the alt-text is more accurate than it is. If you don't use any alt-text at all, I'll use my own AI tools built-in to my screen reader to generate it myself if I care, and I know exactly how accurate or trustworthy those tools may or may not be. This has a few advantages:
1. I'm not just shoving images into Chat GPT or some other enormous LLM. I tend to start with deepseek-ocr, a 3b (3 billion parameter) model. If that turns out not to be useful because the image isn't text, I move up to one of the 90b llama models. For comparison, chat GPT and Google's LLM's are all 3 trillion parameters or larger. A model specializing in describing images can run on a single video card in a consumer PC. There is no reason to use a giant data center for this task.
2. The AI alt text is only generated if a blind person encounters your image, and cares enough about it to bother. If you're generating AI alt text yourself, and not bothering to check or edit it at all, you're just wasting resources on something that nobody may even read.
3. I have prompts that I've fiddled with over time to get me the most accurate AI descriptions these things can generate. If you're just throwing images at chat GPT, what it's writing is probably not accurate anyway.

If you as a creator are providing alt text, you're making the implicit promise that it's accurate, and that it attempts to communicate what you meant by posting the image. If you cannot, or don't want to, make that promise to your blind readers, don't bother just using AI. We can use AI ourselves, thanks. Though it's worth noting that if you're an artist and don't want your image tossed into the AI machine by a blind reader, you'd better be providing alt text. Because if you didn't, and I need or want to understand the image, into the AI it goes.