The A18 iPhone chip has 15b transistors for the GPU and CPU; the Taalas ASIC has 53b transistors dedicated to inference alone. If it's anything like NPUs, almost all vendors will bypass the baked-in silicon to use GPU acceleration past a certain point. It makes much more sense to ship a CUDA-style flexible GPGPU architecture.
Dedicated inference ASICs are a dead end. You can't reprogram them, you can't finetune them, and they won't keep any of their resale value. Outside cruise missiles it's hard to imagine where such a disposable technology would be desirable.
For a 2.5 kW Server? I don't see it happening, your money and electricity is better spent on CUDA compute.
Planned obsolescence? /s
Jokes aside, they can make the "LLM chip" removable. I know almost nothing is replaceable in MacBooks, but this could be an exception.