That was never the aim. LLMs are not designed to be generally intelligent, just to be really good at producing believable text.
That's apparently about 6k books' worth of data.
Oh, come on, surely not just a couple months.
Benchmarks may boast some fancy numbers, but I just tried to save some money by trying out Qwen3-Next 80B and Qwen3.5 35B-A3B (since I've recently got a machine that can run those at a tolerable speed) to generate some documentation from a messy legacy codebase. It was nowhere close neither in the output quality nor in performance to any current models that the SaaS LLM behemoth corps offer. Just an anecdote, of course, but that's all I have.
Costs a few hundred thousand per server, it's a huge expense if you want it at your home but a rounding error for most organizations.
Newer Blackwell does 200+ tokens per second on the largest models and tens of thousands on the smaller models. Most military applications require fast smaller models, I'd imagine.
Also, custom chips are reportedly approaching an order of magnitude more for the price. It's a matter of availability right now, but that will be solved at some point.