upvote
I think average users are already okay with the reasoning level they'd get with current open models. But the big AI firms have pivoted their frontier models towards the enterprise: coding and research, as opposed to general chat. And scale is quite important for these uses, ordinary pro hardware is not enough.
reply
This is really just a question of product design meeting the technology.

Today, lots of integer compute happens on local devices for some purposes, and in the cloud for others.

Same is already true for matmul, lots of FLOPS being spent locally on photo and video processing, speech to text, …

No obvious reason you wouldn’t want to specialize LLM tasks similarly, especially as long-running agents increasingly take over from chatbots as the dominant interaction architecture.

reply