That was never the aim. LLMs are not designed to be generally intelligent, just to be really good at producing believable text.
That's apparently about 6k books' worth of data.
Oh, come on, surely not just a couple months.
Benchmarks may boast some fancy numbers, but I just tried to save some money by trying out Qwen3-Next 80B and Qwen3.5 35B-A3B (since I've recently got a machine that can run those at a tolerable speed) to generate some documentation from a messy legacy codebase. It was nowhere close neither in the output quality nor in performance to any current models that the SaaS LLM behemoth corps offer. Just an anecdote, of course, but that's all I have.