upvote
This is really fascinating to me. I was reading this article and originally agreed with you, "I mean, under the covers it's got to be converting to text tokens at some point, so there is no way it's actually cheaper for Claude itself to execute."

But then there is a comment below talking about how DeepSeek was able to get a huge improvement in compression by using visual tokens, https://news.ycombinator.com/item?id=48777848. I don't fully understand all of the underlying technical details so I am still fundamentally baffled about how going the OCR route could actually result in overall electricity/computational savings.

reply
It wouldn’t, they’re subsidizing it for training.
reply