undefined

points

[-]

Configure a subagent in your coding harness to spin up a new sub-session with any vision model for those tasks and feed the result back to the main model. No need for "one model that does everything"

by WASDx3 hours ago|

parent|

[-]

Are you suggesting it should summarize the image in text or generate it in HTML or something else?

by x3cca1 hours ago|

prev|

[-]

I've been using Google ai studio as a free vision bridge. Gemma 31B is dummy capable at vision and at 1500 rpd its basically unlimited.

by _pdp_6 hours ago|

prev|

[-]

I don't see this being such a big gap. There are some use-cases for sure but apart from UX/UI work it is not really needed. Besides, none of the frontier models can replicate actual images - the can approximate at least in my own experience.

by simonw6 hours ago|

parent|

[-]

One of my tests for a new model is dumping in a screenshot of a web page and seeing if it can recreate it from scratch in HTML and CSS.

Even the local models I run on my Mac are getting surprisingly good at that now.

by tiahura5 hours ago|

parent|

prev|

[-]

Using llms to generate docx. Being able to rasterize and review is an important part of the process.

by 5 hours ago|

parent|

prev|

[-]

deleted

by ashenke5 hours ago|

prev|

[-]

I had the same reaction with Deepseek V4 ! It would be more useful as a vision model