For example:
> Honest caveat, visible in the clip: the pxpipe arm answered the count first and needed one follow-up nudge to also print the ledger balance in the requested one-line format; the plain arm followed the format on the first try. Legibility is solved on Fable — single-reply format compliance is the remaining rough edge.
If I reread this four times, I can sort of interpolate what happened, but it’s mostly pointless and confusing information.
In my experience all models do this to an extent, but Claude seems to be the worst at this. GPT 5.5 is a bit more terse but seems to compress more valuable information.
My guess is that it's a known problem, which steered the frontier models into bullet point preference.
To be fair, as you can see in the clip, the two models handled the prompt slightly differently. The pxpipe variant gave the right count initially but needed a quick follow-up to output the ledger balance in a single line. The standard model, on the other hand, nailed the formatting on its first try. We've completely solved readability here on Fable; our only real hurdle left is getting the models to follow formatting constraints perfectly on the very first reply.
Of course, this was just rewritten by another LLM.