There goes another bit of my writing style that will get mistaken for an LLM.
> Redefining AI efficiency with extreme compression
"Redefine" is a favorite word of AI. Honestly no need to read further.
> the key-value cache, a high-speed "digital cheat sheet" that stores frequently used information under simple labels
No competent engineer would describe a cache as a "cheat sheet". Cheat sheets are static, but caches dynamically update during execution. Students don't rewrite their cheat sheets during the test, do they? LLMs love their inaccurate metaphors.
> QJL: The zero-overhead, 1-bit trick
> It reduces each resulting vector number to a single sign bit (+1 or -1). This algorithm essentially creates a high-speed shorthand that requires zero memory overhead.
Why does it keep emphasizing zero overhead? Why is storing a single bit a "trick?" Either there's currently an epidemic of algorithms that use more than one bit to store a bit, or the AI is shoving in extra plausible-sounding words to pad things out. You decide which is more likely.
It's 1:30am and I can't sleep, and I still regret wasting my time on this slop.
It's the structure and rhythm at the sentence and paragraph levels that's the current tell, as SOTA LLMs all seem to overuse clarification constructs like "it's not X, it's Y" and "it's X, an Y and a Z", and "it's X, it's essentially doing Y".
Thing is, I actually struggle to find what's so off-putting about these, given that they're usually used correctly. So far, the best hypothesis I have for what makes AI text stand out is that LLM output is too good. Most text written by real humans (including my own) is shit, with the best of us caring about communicating clearly, and most people not even that; nobody spends time refining the style and rhythm, unless they're writing a poem. You don't expect a blog post or a random Internet article (much less a HN comment) to be written in the same style as a NYT bestseller book for general audience - but LLMs do that naturally, they write text better at paragraph level than most people ever could, which stands out as jarring.
> Either there's currently an epidemic of algorithms that use more than one bit to store a bit, or the AI is shoving in extra plausible-sounding words to pad things out. You decide which is more likely.
Or, those things matter to authors and possibly the audience. Which is reasonable, because LLMs made the world suddenly hit hard against global capacity constraints in compute, memory, and power; between that and edge devices/local use, everyone who pays attention is interested in LLM efficiency.
(Still, it makes sense to do it as a post-processing style transfer space, as verbosity is a feature while the model is still processing the "main" request - each token produced is a unit of computation; the more terse the answer, the dumber it gets (these days it's somewhat mitigated by "thinking" and agentic loops)).
You're not wrong, but it certainly is an annoying outcome of AI that we're not allowed to use.. words.. anymore.
It reads like a pop science article while at the same time being way too technical to be a pop science article.
Turing test ain't dead yet.
Only because people are lazy, and don't bother with a simple post-processing step: attach a bunch of documents or text snippets written by a human (whether yourself or, say, some respected but stylistically boring author), and ask the LLM to match style/tone.