undefined

points

[-]

If you think of it from the point of view of the universal approximation theorem, it's all efficiency optimisation. We know that it works if we do it incredibly inefficiently.

Every architecture improvement is essentially a way to achieve the capability of a single fully-connected hidden layer network n wide. With fewer parameters.

Given these architectures usually still contain fully connected layers, unless they've done something really wrong, they should still be able to do anything if you make the entire thing large enough.

That means a large enough [insert model architecture] will be able to approximate any function to arbitrary precision. As long as the efficiency gains with the architecture are retained as the scale increases they should be able to get there quicker.

by ertgbnm9 hours ago|

prev|

[-]

Most breakthroughs that are published are for efficiency because most breakthroughs that are published are for open source.'

All the foundation model breakthroughs are hoarded by the labs doing the pretraining. That being said, RL reasoning training is the obvious and largest breakthrough for intelligence in recent years.

by WarmWash7 hours ago|

parent|

[-]

With all the floating around of AI researchers though, I kind of wonder how "secret" all these secrets are. I'm sure they have internal siloing, but even still, big players seem to regularly defect to other labs. On top of this, all the labs seem to be pretty neck and neck, with no one clearly pulling ahead across the board.

by irthomasthomas9 hours ago|

prev|

[-]

Efficiency gains can be used to make existing models more profitable, or to make new larger and more intelligent models.

by cubefox7 hours ago|

parent|

[-]

Some yes, others no. Distillation and quantization can't be used to make new base models since they require a preexisting one.

by irthomasthomas4 hours ago|

parent|

[-]

it enables models larger than was previously possible.

by cubefox4 hours ago|

parent|

[-]

No because the base model from which the distilled or quantized models are derived is larger.

by cubefox7 hours ago|

prev|

[-]

> What are the most importsnt breakthroughs from the past two or three years for intelligence?

The most important one in that timeframe was clearly reasoning/RLVR (reinforcement learning with verifiable rewards), which was pioneered by OpenAI's Q* aka Strawberry aka o1.