> you'd end up consuming so much excess electricity it would be cheaper on net to simply take the money that would have gone to the power bill and spend it on your own datacenter.
Costs spread over a large population, it really doesn't matter. You're not getting hundreds of thousands of people to pitch half their monthly electric bill to pay for someone else's datacenter. They will pay the electricity themselves quite happily though, if all they need to do is give you compute. This isn't new.
Interconnect is the bottleneck for distributed training, nothing else really.
Then have inference go down to the next layer to use those models as a P2P decentralized network.
Maybe like open router could tap federation networks.
Not sure what you are referring to, unless you don't think h100/h200/b200 are "AI hardware"
> Superpods aren't really power efficient
Maybe not compared to a specialized rig with multiple 4090s, but that is the best case for consumer hardware - the vast majority will be dramatically less efficient than that
Anyway, I agree the interconnect is by far the biggest obstacle and seems insurmountable, I should probably have led with that.
I recall getting really excited over hinton's FF foray, right before he bailed on AI as a societal direction (which, if anyone ever had the right, I suppose he does). If one squints, one can see a backprop-free base being much easier to train on geographically distributed and heterogenous hardware.
Agree about the physics; disagree about the larger point.
I am not questioning that servers packed together may achieve an optimal result in how we are currently doing things, but, and this is my point, what if we didn't.
<< you cannot get that with distributed training
This is entirely the wrong question to ask. The question to ask is: how it could be adapted to distributed training.
100% agree. The US government basically has to nationalize AI and capture an outsize portion of the revenue from it in order to fix the economy, as the combination of debt burden and interest rate pressure from de-dollarization/global realignment is going to push us into a death spiral, and even if AI is a smash hit, the ~19% federal capture of corporate revenue isn't nearly enough to pull us out of it. The people owning the compute infrastructure and capturing more profit from AI at that layer is the safest, cleanest way to increase revenue capture, a sovereign wealth fund is a mediocre idea because it's possible to play shell game with stocks and redirect profit/debt (venture capital is quite good at this!).
Any actual numbers to back this up? I don't see how nationalizing a very cutting edge technology outside of wartime is going to go super well. The leverage that these companies have is the same leverage that TSMC has: you can't just take over and expect things to rocket at the pace its going
Currently AI has generated no profit. And as it sits, is a non viable business.
I refuse to include the sellers of shovels as AI revenue.
If the companies buying the shovels are still losing money, then the tool supplier fortunes have nothing to do with the economics of the AI application layer, who is losing money on every prompt.
There's clearly also a lot of pent up demand in the corporate world for inference, the problem is that it's currently expensive enough that enterprises are balking at the cost before they've had a chance to refine processes and see projects through to fruition. That's a tractable problem to solve though.
Airlines, for example, which are so profitable they continually go bankrupt.
If you were to take 500 computers with older 1080 GPUs, you might have enough compute/ram equivalent to an H200 GPU for training such a model. Maybe take 10000.
But if those machines are spread over 10000 homes, wired with residential internet service, training a large model will not get anywhere.
You go from "data in the same HBM memory chip" at 4.8TB/s or "data in adjacent GPU" with NVlink at 1.2 TB/s down to 25 MBit/s upload speed. Accessing the next piece of data is going to be about a Million times slower. At the same time you will heat a thousand times more, for a Million times longer.
One would question why this hasn't already happened as the rule and as opposed to the proliferation of private data centers. However, I am sure the answers are plain and perhaps saddening to us all.
The thing that big models will always bring to the table is the ability to YOLO weak/under-specified prompts, and spend less time in the loop making sure work gets partitioned correctly. For smaller/simpler tasks the P(success) difference isn't that big.
Disagreed. GLM-5.1 is easily as good as Opus 4.5 for all the coding purposes I could throw at it, which is the model that kicked this entire hype cycle into overdrive in the first place.
One being that extrapolating from like 3 data points is hardly science. All trends break at some point.
The other is that the measures to prevent distillation of their models (if it was a secret sauce of Chinese models) could work if nobody is allowed to use them.
I mean thats good, but they'd have to also build thier own dataset. Which involves either paying people, or breaking the law.
Plus if they do manage to make it work, they will not get any tax revenue from it, as it'll remove the need for labour, which is where a huge amount of tax revenues come from.
its a deeply hard problem with lots of second/third order effects.
The first part is not really true though, the chips are not that much faster, the DRAM is not that much faster, and in aggregate it does not matter because there is just so much more consumer hardware out there (although perhaps that is changing as supply shifts toward datacenters).
The interconnect and data locality is the problem. If you could train it like e.g. you can render a scene with monte carlo ray tracing, any result from any node could be merged with any other and the combined result would have converged closer to the limit. I am sure research in that direction exists, it just has not proven effective within the scales it has been attempted.