It's not a 60% percent reduction in cost for 100% of the same output. If you have a model and input text A, and you fix the seed etc. and run Text A through the model as text tokens and as compressed image tokens, you will not get identical outputs. You're specifically reducing the number of tensors needed to represent your input, which saves you on raw compute, but also by definition gives you less room to represent the information in your input. It's lossy, in other words.
Put another way, if you're using a model like Fable because you need the absolute frontier of capability and cheaper models cannot solve your tasks, then there is a very real chance that a compression strategy like this drops Fable's accuracy such that it's no longer suitable for your task. Which defeats the point of you paying for the most expensive model in the first place.
So, it's cool research. Might be useful for some people. Probably isn't something that has incredible utility in real use cases.
To me compression implies smaller size? However new line chars seems to be removed in the pic so I guess it could be expressed in fewer bytes than the original text with further compression ...
DeepSeek published a pretty well circulated paper on exactly this many months ago. It just hasn’t been attempted and shared publicly, asa retrofit, AFAIK.
Also, it’s no free lunch, the readme indicates that this “use images” hack is lossy and reduces success rates alongside the reduced cost. Most labs would focus on success increases regardless of price.
I miss when HN was mostly people that knew what they were talking about.
The image trick reduces context because it’s lossy. The README says you can’t use it for anything needing exact recall. It produces a gist of the input.
You could achieve something similar by using a small, cheap model to pre-summarize information for the expensive LLM. This is what many people do already and it’s a much better way to do it for most situations.
Also I don't think you realize how much dumb stuff is still left on the table. That the market is worth trillions is quite irrelevant here given the dynamism of the field.