Note by @fastfinge

@pvd1313 @the5thColumnist @RachelThornSub Depends. I like to start with deepseek-ocr if I have any reason to suspect the image is text. If it is, I can stop there. Otherwise, I move up to something like microsoft/phi-4-multimodal-instruct. If I still care and didn't get enough, llama-3.2-90b-vision-instruct will do the trick for most things. Only if it's charts and graphs that I care about do I need to use either the Google or OpenAI models. If it's pornographic, I have to use Grok, because XAI is completely and utterly unhinged and won't refuse anything no matter what. I use everything either locally where possible, or via the openrouter.ai API. That way it's more private, and I'm only paying for what I use. I usually use the tool: github.com/SigmaNight/basiliskLLM

It supports ollama, openrouter, and any openAI compatible endpoint, and integrates perfectly with the NVDA screen reader.