upvote
Nah, optical compression is a thing. You see it in a lot of different areas in ML. In this case, the "trick" has been known for a while, and belongs to a whole world of compression research. But I think where you're maybe getting mixed up is in where that 60% gain is coming from.

It's not a 60% percent reduction in cost for 100% of the same output. If you have a model and input text A, and you fix the seed etc. and run Text A through the model as text tokens and as compressed image tokens, you will not get identical outputs. You're specifically reducing the number of tensors needed to represent your input, which saves you on raw compute, but also by definition gives you less room to represent the information in your input. It's lossy, in other words.

Put another way, if you're using a model like Fable because you need the absolute frontier of capability and cheaper models cannot solve your tasks, then there is a very real chance that a compression strategy like this drops Fable's accuracy such that it's no longer suitable for your task. Which defeats the point of you paying for the most expensive model in the first place.

So, it's cool research. Might be useful for some people. Probably isn't something that has incredible utility in real use cases.

reply
> a compression strategy

To me compression implies smaller size? However new line chars seems to be removed in the pic so I guess it could be expressed in fewer bytes than the original text with further compression ...

reply
> Some random person discovered a 60% across the board gain in all LLMs, using an extremely simple trick that none of the labs noticed in all these years of multi-trillion dollar growth

DeepSeek published a pretty well circulated paper on exactly this many months ago. It just hasn’t been attempted and shared publicly, asa retrofit, AFAIK.

Also, it’s no free lunch, the readme indicates that this “use images” hack is lossy and reduces success rates alongside the reduced cost. Most labs would focus on success increases regardless of price.

reply
If the trick were genuinely useful, and was well circulated months ago, the resource-starved inference providers would have squeezed this trick dry already, instead of wasting 60% of their tokens, waiting for users to implement it themselves in 5 minutes of effort.
reply
100%.

I miss when HN was mostly people that knew what they were talking about.

reply
Alternative 1 isn’t all that unlikely given Opus 4.8 couldn’t do this. So it’s a recently possible hack. Not something LLM corps have been blindsided by for years. I also strongly recommend RTFA in this case, namely ”The honest part, read before relying on it”
reply
I think you missed the part where this is a lossy technique that reduces performance.

The image trick reduces context because it’s lossy. The README says you can’t use it for anything needing exact recall. It produces a gist of the input.

You could achieve something similar by using a small, cheap model to pre-summarize information for the expensive LLM. This is what many people do already and it’s a much better way to do it for most situations.

reply
This has been known since VLMs were a thing, that more information can be encoded visually and token efficiency is increased. But it came with performance issues (more hallucinations, etc).

Also I don't think you realize how much dumb stuff is still left on the table. That the market is worth trillions is quite irrelevant here given the dynamism of the field.

reply