undefined

points

[-]

The difference is a lot more than just throwing scale at it, pretty much everything useful comes from an evolving landscape of post-training techniques.

Of course, param count and context length are also important because they increase the model's overall fidelity, but a base model without SFT, RHLF etc is effectively useless.

by darth_avocado1 hours ago|

parent|

[-]

Correct. That is what I was trying to hint at. Yes, massive compute is needed to train ai, but it isn’t the only thing. A lot of research and experimentation goes into moving the marker just a little bit. Innovation can’t be forced into weekly sprints, it takes its own time.

by paytonjjones1 hours ago|

parent|

[-]

Research and experimentation on neural nets has been going on since the 70s (arguably much earlier even), but the lions share of capability changes has all been in the last couple years.

Scale was really the unlock; the new pre and post training techniques and architectures are very cool and useful but they definitely aren't the differentiators when comparing to the previous era of NLP.

by codemog3 hours ago|

prev|

[-]

They already tried that with GPT-4 and GPT-4.5

They were allegedly massive but the cost and returns were not worth it.