upvote
Maybe not AGI, but if you look at the differences between, say, GPT-2 and GPT 5.5, it's remarkable how well it works to mostly just throw scale at the problem.
reply
The difference is a lot more than just throwing scale at it, pretty much everything useful comes from an evolving landscape of post-training techniques.

Of course, param count and context length are also important because they increase the model's overall fidelity, but a base model without SFT, RHLF etc is effectively useless.

reply
Correct. That is what I was trying to hint at. Yes, massive compute is needed to train ai, but it isn’t the only thing. A lot of research and experimentation goes into moving the marker just a little bit. Innovation can’t be forced into weekly sprints, it takes its own time.
reply
Research and experimentation on neural nets has been going on since the 70s (arguably much earlier even), but the lions share of capability changes has all been in the last couple years.

Scale was really the unlock; the new pre and post training techniques and architectures are very cool and useful but they definitely aren't the differentiators when comparing to the previous era of NLP.

reply
They already tried that with GPT-4 and GPT-4.5

They were allegedly massive but the cost and returns were not worth it.

reply