My productivity profits from the best intelligence available, a decent context size, and a batch size of four.
While my MacBook has 48 GB of RAM, not only do I want the above requirements at a decent speed, but I also need my machine to run the development tools and test suites, ideally without the fans blasting at full load.
For the foreseeable future I will stay with providers rather than local inference, apart from niche use cases.
I'm in Australia, so we're probably not getting access to Fable again. We're learning that a faster model + better harness/framework > smarter model. So being able to run GLM5.2 locally and super-fast would be great.
But the existing tech we're using for 16Gb probably isn't going to scale to 16Tb at a reasonable price point. And the price point is relatively inelastic - people are used to paying <$5K for their computers, and they're not going to go much above that. You'll get early adopters paying $10K or more for a machine that large, but not the early majority. And even then, obviously, $10K is not going to buy you a 16Tb memory machine.
So there's room for a new technology to come in, where there wasn't previously. This is what happened all through the 90's, and we churned through a bunch of standards and technologies to try and keep up with demand.