I still can't get a good mental model for when these things will work well and when they won't. Really does feel like gambling...
All Chinese labs have to do to tank the US economy is to release open-weight models that can run on relatively cheap hardware before AI companies see returns.
Maybe that's why AI companies are looking to IPO so soon, gotta cash out and leave retail investors and retirement funds holding the bag.
Regarding the latter, smaller models are really good for what they are (free) now, they'll run on a laptop's iGPU with LPDDR5/DDR5, and NPUs are getting there.
Even models that can fit in unified 64GB+ memory between CPU & iGPU aren't bad. Offloading to a real GPU is faster, but with the iGPU route you can buy cheaper SODIMM memory in larger quantities, still use it as unified memory, eventually use it with NPUs, all without using too much power or buying cards with expensive GDDR.
Qwen-3.5 locally is "good enough" for more than I expected, if that trend continues, I can see small deployable models eventually being viable & worthy competition, or at least being good enough that companies can run their own instead of exfiltrating their trade secrets to the worst people on the planet in real-time.
Of course it's in the areas where it doesn't matter as much, like experiments, internal tooling, etc, but the CTOs will get greedy.