upvote
this is mcp or custom call to lowest cost model

someone did a webcam + agentic + capture of other computer bios/boot -> upload to image model -> back to agent

reply
what do you use vision for? I have failed to find a workflow with it that makes sense, asking it to review screenshots of websites or whatever it misses extremely obvious details like text flowing out of it's container/overlapping other text, things being in entirely the wrong place, etc.
reply
What models have you tried? Gemini 3.1 pro has vision capable of reading my sloppy diaries from 10 years ago, down to small glyphs and doodles.
reply
I mean they mostly work for OCR, I meant in a coding context.
reply
For coding?
reply