undefined

points

[-]

False. The absolute capability is irrelevant, with the proper harness 31b is more than adequate for a very large portion of the tasks I ask AI to do. The metric isn't how good the model is at Erdos Problems, it's how reliably it can remove drudgery in my life. It just autonomously reverse engineered a bluetooth protocol with minimal intervention, it's ability to react to data and ground itself is constantly impressive to me. I do a ton of testing with these models, today I had Gemma answer a physics problem that Opus 4.7 gave up on. With a decent harness and context the set of tasks where their capabilities are both good enough is very surprising. The tasks I have that stump Gemma often also stump Opus 4.7.

by diordiderot2 hours ago|

parent|

[-]

Maybe reaching for an analogy would be helpful here.

Thot_experiment is saying that his 2016 Toyota Prius is a great and reliable car for his daily commute and running errands.

Whereas everyone is screeching about its capability gap with a Lockheed Martin F35 lightning.

by amelius4 hours ago|

parent|

prev|

[-]

This is like saying that 640kB is enough for anybody.

by thot_experiment4 hours ago|

parent|

[-]

No, it isn't. I am saying that the set of tasks that can be completed by Opus 4.7 has a surprisingly large overlap with the set of tasks that can be completed by Gemma 31B. It is meaningfully equivalent in many cases.

(of course if i'm being honest 640kB is fine, i'm sure tons of the world's commerce is handled by less for example, the delta between a system with 640kb of ram and a modern one is near nil for many people, the UX on a PoS terminal does not require more than that for example, the hacker news UX could also be roughly the same)

by BoredomIsFun4 hours ago|

prev|

[-]

It would be true, if model providers did not throttle their models. I do not have definitive proof they do but the rumors are abundant.