upvote
LLMs have a very bloated in-memory representation for text, on the order of megabytes of KV cache per byte of text. Meanwhile, for images a lossy representation is considered acceptable and it only takes up maybe a kilobyte of KV cache per byte of image. So if you can render your text into a hundred bytes of image per byte of text and then lossily expand it into 100 kB of KV cache per byte of text, you come out ahead!

Whether such lossy compression is acceptable for your use case is up to you.

reply
It wouldn’t, they’re subsidizing it for training.

Edit: didn’t realize this occurred on local models(!!),

this is smarter https://news.ycombinator.com/item?id=48779884

reply
can't explain with subsidies a model you host yourself (like deepseek)
reply
Then you are paying for the electricity. It's not physically possible to do more computation & not use more energy b/c every arithmetic operation requires a minimum amount of energy so more operations = more energy.
reply
deleted
reply