Cadence and rhythm: LLMs produce sentences with an extremely low variability in the number of clauses. Normal people run on from time to time, (bracket in lots of asides), or otherwise vary their cadence and rhythm within clauses more than LLMs tend to.
Section headings that are intended to be "cute" and "snappy" or "impactful" rather than technically correct or compact: this is especially a tell when the cuteness/impactfulness is deeply mismatched with the seriousness or technical depth of the subject matter.
Horrible trite analogies that show no actual real understanding of the actual logical, mathematical, or visuo-spatial relationships involved. I.e. analogies are based on linguistic semantics, and not e.g. mathematical isomorphism or core dynamics. "Humans cannot fly. Building airplanes does not change that; it only means we built a machine that flies for us". Can't imagine a more retarded and useless analogy for something as complex as the article topic.
Verbose repetition: The article defines two workarounds: "tool use" and "agentic" orchestration, then defines them, then in the paragraph immediately following, says the exact same thing. There are basically multiple (small paragraphs) that all say nothing at all more than the sentence "LLMs do not reliably perform long, exact computations on their own, so in practice we often delegate the execution to external tools or orchestration systems".
Pseudo-profound bullshit: (https://doi.org/10.1017/S1930297500006999). E.g. "A system that cannot compute cannot truly internalize what computation is." There is thankfully not too much of this in the article, and it appears mostly early on.
Missing key / basic logic (or failing to mention such points clearly) when this would be strongly expected by any serious practitioner or expert: E.g. in this article, we should have seen some simple nice centered LaTeX showing the scaled dot-product self attention equation, and then some simple notation to represent the `.chunk` call, and subsequent linear projection, something like H = [H1 | H2], or etc., I shouldn't have to squint at two small lines of PyTorch code to find this. It should be clear immediately this model is not trained, and this is essentially just compiling a VM into a Transformer, and not revealed more clearly only at the end.