upvote
Uninformed opinion of someone who clearly doesnt consistently use AI coding tools, clearly. And why are you limiting it to 6 months? Whats wrong with you?
reply
Why does this _always_ happen in agentic coding convos?

> I don't find $MODEL useful

> CLEARLY you're doing it wrong

It's so dumb.

(I write code w/ agents btw, I'm just also skeptical)

reply
How many years of real-life, in-production problem solving/coding have you done? That's what I base how informed you are not how much you use your favorite new $100/month token-prediction subscription
reply
15 years. But that's irrelevant to this point. The person im replying to clearly doesnt use the tools if they think there hasnt been constant improvement. "token-prediction subscription" is funny, coming from a glorified biological token predictor
reply
I'm starting to think the AI maxis are just misanthropes.
reply
ah yes another feeble fool that thinks his 100$ subscription is equivalent to 400 billion years of evolution simply because he is stupid and watches a lot of scifi.
reply
Nope not at all, but it's most certainly superior to the tokens your neural net outputs
reply
say that you are alive

"i am alive"

OH MY GOD!!

reply
deleted
reply
I’m going to parrot back what you’re saying and you tell me if I’m getting close

- AI coding is a disappointing fad (“fever dream?”). - that has not made meaningful progress in…6 months? - coding harness is improving - model improvements are lies: it’s just businesses “benchmaxxing” and misleading people. Real performance has not meaningfully improved - “opus 4.7 is a dud” - 5.5 suffering from “system collapse” (I’ve never heard this term before)

Since you asked and I assume you are rational and really are interested to know:

- we have many measures of performance and have studied how one particularly important but unintuitive measure (pertaining perplexity) scales with data, compute, and model size. These laws continue to hold and have satisfying theoretical origins.

- whatever the scale of 5.5, consider we have far more room to go on the scaling front. Probably another 2-3 orders of magnitude before we hit limiting bottlenecks.

- that’s also fine because scaling is only part of the puzzle. RL on verifiable rewards is virtually guaranteed to get you optimal performance and that’s the entirety of the excitement around coding agents

- while you are right about benchmarks and measurement science having a ton of weaknesses, they are not at all garbage. There are probably around 40,000 benchmarks in the literature (this is not a made up number by the way it really is around that many). Epoch made a great composite measure using good stats (IRT) called their epoch capability index, METR has done and redone their time horizon measure and it holds up beautifully. There is a ton of signal in many benchmarks and they all tell a pretty compelling story.

- additionally, this is not some unknowable thing. It strikes me as odd that people’s prior on HN a lot of time is “it’s all dumb rich people putting way too much dumb money in this”. Sorry but the world is not that dumb. Trillions of CapEx is usually pretty rationally allocated. And it is!

- why? Because this is already known what happens when you do what we’re doing. When you have a verifiable reward system, have a certain amount of compute available, have seed data to get you to where you can do RL, you will be almost guaranteed to get superhuman performance

reply
I'm pretty sure their mindset is pure cope. All top AI labs are agentically coding 100% now. There's a reason for that. Anyone not on that paradigm yet is either slow acting or purposefully resistant. (excluding workplace policies that hamstring you of course)
reply
Yea that’s what I just can’t wrap my mind around. It’s a cacophony of engineers with authoritative sounding blog posts explaining a subject they seem to have barely a tenuous grasp on. It’s hard to watch a population of tech people I used to really revere getting things so wrong. I thought “surely once we’re <literally where we are today which is what you describe> no one with any self respect would still claim AI is a useless fad or that it shouldn’t be used” and yet to my disappointment that’s where we seem to be.
reply