undefined

points

[-]

What amazes me the most is the speed at which things are advancing. Go back a year or even a year before that and all these incremental improvements have compounded. Things that used to require real effort to consistently solve, either with RAGs, context/prompt engineering, have become… trivial. I totally agree with your point that each step along the way doesn’t necessarily change that much. But in the aggregate it’s sort of insane how fast everything is moving.

by Rudybega4 hours ago|

parent|

[-]

The denial of this overall trend on here and in other internet spaces is starting to really bother me. People need to have sober conversations about the speed of this increase and what kind of effects it's going to have on the world.

by SatvikBeri6 hours ago|

prev|

[-]

I use Claude Code every day, and I'm not certain I could tell the difference between Opus 4.5 and Opus 4.0 if you gave me a blind test

by clhodapp6 hours ago|

prev|

[-]

And of course the benchmarks are from the school of "It's better to have a bad metric than no metric", so there really isn't any way to falsify anyone's opinions...

by malshe7 hours ago|

prev|

[-]

This pretty accurately summarizes all the long discussions about AI models on HN.

by cactusplant73746 hours ago|

prev|

[-]

Hourly occurrence on /r/codex. Model astrology is about the vibes.

by wasmainiac7 hours ago|

prev|

[-]

[flagged]

by nocman7 hours ago|

parent|

[-]

> Who are making these claims? script kiddies? sr devs? Altman?

AI agents, perhaps? :-D

by locknitpicker7 hours ago|

parent|

prev|

[-]

> All anonymous as well. Who are making these claims? script kiddies? sr devs? Altman?

You can take off your tinfoil hat. The same models can perform differently depending on the programming language, frameworks and libraries employed, and even project. Also, context does matter, and a model's output greatly varies depending on your prompt history.

by andrepd6 hours ago|

parent|

[-]

It's hardly tinfoil to understand that companies riding a multi-trillion dollar funding wave would spend a few pennies astroturfing their shit on hn. Or overfit to benchmarks that people take as objective measurements.

by BoredPositron7 hours ago|

parent|

prev|

[-]

When you keep his ramblings on twitter or company blog in mind I bet he is a shit poster here.