The state-of-the-art models aren't at "can fully replace knowledge worker" levels yet and I doubt they'll get there any time soon, so charging $2000 / month for access isn't going to happen. Right now everyone and their dog is being handed subsidized credits to play with AI, but the actual outcome is rarely good enough to be worth the money they'd need to charge for it. It might very well take another order of magnitude or two to get LLMs to be truly good (if it is even possible at all), and considering how much money is already being pumped into it I just don't see that happening.
On the other hand, the dumb models are more than adequate for simple noncritical tasks, like directing a user to the appropriate FAQ entry, or playing phone decision tree. There's a lot of money in making chatbot assistants actually useful, or in augmenting website search. Turning it into a glorified "language-to-API-call" translator doesn't take a lot of smarts, but as long as it's cheap you can make a killing in volume.
This is a lane I’ve been experimenting in —- seeing what I can get out of models that work in 16GB VRAM for simple tasks (screen scraping, decision tree navigation, natural language queries). It’s interesting for sure (certainly reveals non-deterministic limits) and promising for low criticality review-opportunity tasks, but I also feel like I need better sources/community for understanding and reflection. Preferably those that aren’t hype channels. Any pointers?
I understood it as a proof-of-concept, not a for-mass-production single blueprint - i.e.: "if you need your NN in a CIM form on ASIC, we can do it".
Their next proof-of-concept was said to be meant to be about size: "we showed you we can do it with 8b, now we are working to show you we can do 24b or 32b". Then, "and we plan to go bigger and faster".
> Our second model, still based on Taalas’ first-generation silicon platform (HC1), will be a mid-sized reasoning LLM. It is expected in our labs this spring and will be integrated into our inference service shortly thereafter. // Following this, a frontier LLM will be fabricated using our second-generation silicon platform (HC2). HC2 offers considerably higher density and even faster execution. Deployment is planned for winter (19 Feb 2006)