undefined

upvote

points

by aspenmartin16 hours ago |

upvote

by strange_quark14 hours ago|

[-]

What rapid improvement has occurred, because in this six month AI coding fever dream we've been living in, I really haven't seen anything new in awhile, both in terms of new ideas for AI coding or in new consumer products or services.

I'll give you the coding harnesses themselves are better because that was a new product category with a lot of low-hanging fruit, but have the models actually improved in a way that isn't just benchmaxxing? I'd argue the models seem to be regressing. Even the most AI-pilled people at my company have all complained that Opus 4.7 is a dud. Anecdotally, GPT 5.5 seems decent, but it's rumored to be a 10T parameter model, isn't noticeably better than 5.4 or 5.3, is insanely expensive to use, and seems to be experiencing model collapse since the system prompt has to beg the thing to not talk about goblins and raccoons.

reply

upvote

by jatora14 hours ago|

[-]

Uninformed opinion of someone who clearly doesnt consistently use AI coding tools, clearly. And why are you limiting it to 6 months? Whats wrong with you?

reply

upvote

by weakfish28 minutes ago|

[-]

Why does this _always_ happen in agentic coding convos?

> I don't find $MODEL useful

> CLEARLY you're doing it wrong

It's so dumb.

(I write code w/ agents btw, I'm just also skeptical)

reply

upvote

by zapataband113 hours ago|

[-]

How many years of real-life, in-production problem solving/coding have you done? That's what I base how informed you are not how much you use your favorite new $100/month token-prediction subscription

reply

upvote

by jatora11 hours ago|

[-]

15 years. But that's irrelevant to this point. The person im replying to clearly doesnt use the tools if they think there hasnt been constant improvement. "token-prediction subscription" is funny, coming from a glorified biological token predictor

reply

upvote

by slopinthebag4 hours ago|

[-]

I'm starting to think the AI maxis are just misanthropes.

reply

upvote

by zapataband19 hours ago|

[-]

ah yes another feeble fool that thinks his 100$ subscription is equivalent to 400 billion years of evolution simply because he is stupid and watches a lot of scifi.

reply

upvote

by jatora7 hours ago|

[-]

Nope not at all, but it's most certainly superior to the tokens your neural net outputs

reply

upvote

by viking1235 hours ago|

[-]

say that you are alive

"i am alive"

OH MY GOD!!

reply

upvote

by 12 hours ago|

[-]

deleted

reply

upvote

by aspenmartin12 hours ago|

[-]

I’m going to parrot back what you’re saying and you tell me if I’m getting close

- AI coding is a disappointing fad (“fever dream?”). - that has not made meaningful progress in…6 months? - coding harness is improving - model improvements are lies: it’s just businesses “benchmaxxing” and misleading people. Real performance has not meaningfully improved - “opus 4.7 is a dud” - 5.5 suffering from “system collapse” (I’ve never heard this term before)

Since you asked and I assume you are rational and really are interested to know:

- we have many measures of performance and have studied how one particularly important but unintuitive measure (pertaining perplexity) scales with data, compute, and model size. These laws continue to hold and have satisfying theoretical origins.

- whatever the scale of 5.5, consider we have far more room to go on the scaling front. Probably another 2-3 orders of magnitude before we hit limiting bottlenecks.

- that’s also fine because scaling is only part of the puzzle. RL on verifiable rewards is virtually guaranteed to get you optimal performance and that’s the entirety of the excitement around coding agents

- while you are right about benchmarks and measurement science having a ton of weaknesses, they are not at all garbage. There are probably around 40,000 benchmarks in the literature (this is not a made up number by the way it really is around that many). Epoch made a great composite measure using good stats (IRT) called their epoch capability index, METR has done and redone their time horizon measure and it holds up beautifully. There is a ton of signal in many benchmarks and they all tell a pretty compelling story.

- additionally, this is not some unknowable thing. It strikes me as odd that people’s prior on HN a lot of time is “it’s all dumb rich people putting way too much dumb money in this”. Sorry but the world is not that dumb. Trillions of CapEx is usually pretty rationally allocated. And it is!

- why? Because this is already known what happens when you do what we’re doing. When you have a verifiable reward system, have a certain amount of compute available, have seed data to get you to where you can do RL, you will be almost guaranteed to get superhuman performance

reply

upvote

by jatora7 hours ago|

[-]

I'm pretty sure their mindset is pure cope. All top AI labs are agentically coding 100% now. There's a reason for that. Anyone not on that paradigm yet is either slow acting or purposefully resistant. (excluding workplace policies that hamstring you of course)

reply

upvote

by aspenmartin59 minutes ago|

[-]

Yea that’s what I just can’t wrap my mind around. It’s a cacophony of engineers with authoritative sounding blog posts explaining a subject they seem to have barely a tenuous grasp on. It’s hard to watch a population of tech people I used to really revere getting things so wrong. I thought “surely once we’re <literally where we are today which is what you describe> no one with any self respect would still claim AI is a useless fad or that it shouldn’t be used” and yet to my disappointment that’s where we seem to be.

reply

upvote

by cyclopeanutopia15 hours ago|

[-]

Perhaps you are confusing performance with instability?

reply

upvote

by aspenmartin12 hours ago|

[-]

No. Time horizon I’m talking about spans years. “We don’t know” is just wrong, we’ve had scaling laws for many years and they continue to hold up. Benchmarks, in all their ugliness, tell a consistent story.

reply

upvote

by paganel15 hours ago|

[-]

> very clear and extremely rapid improvement in a startlingly short amount of time.

We're almost 6 months into all this AI-code madness and I've yet to see that "rapid improvement" you mention. As in software products that are genuinely better compared to 6 months ago, or new software products (and good software products at that) which would have not existed had this AI craze not happened.

reply

upvote

by aspenmartin12 hours ago|

[-]

Way more than six months. You may be talking about how the world looks from your vantage point, as well you should. But there’s a reason why the world doesn’t allocate trillions of dollars of capital based on that.

I really value skeptical people and skepticism generally. But what I think skeptical people would prefer to consider themselves is: rational and reasonable, with their beliefs well calibrated.

You’re not the only one to think that literally nothing major or significant has happened with AI but that’s simply wrong. Every major tech company - the ones poised to get the first best rewards, have already gotten good incremental revenue from AI via ads ranking/recommendations (Google, Meta, etc.), good productivity increases due to scale of workforce and advanced in house tooling. You won’t see these numbers and you don’t have to believe them. But I have seen them and I believe them, and I, like you, hate bullshit.

reply

upvote

by ThrowawayR210 hours ago|

[-]

Classic argumentum ad populum fallacy. The world allocated the equivalent of trillions to the dotcom bubble shortly before it became the dotcom bust, mortgage CDOs before the 2008 debt crisis, and the cryptocurrency mania before its bubble popped. The world has allocated vast sums of money to rather stupid things many, many times in the past.

reply

upvote

by brazukadev10 hours ago|

[-]

> Every major tech company - the ones poised to get the first best rewards, have already gotten good incremental revenue from AI via ads ranking/recommendations (Google, Meta, etc.)

That's just software evolving. It happened before LLMs, it would happen without LLMs.

> good productivity increases due to scale of workforce and advanced in house tooling.

Exactly same case.

reply

upvote

by aspenmartin1 hours ago|

[-]

But I don’t really understand: the ask is for evidence AI is generating meaningful returns and it demonstrably is, even while we have integrated these tools only partially. “Just software evolving” um yes, I agree, just that now this happens faster and more efficiently. It is also more than that: models that power advertising and content recommendation at TikTok, Google, Facebook, Instagram, etc are not just “software evolving” it is meaningful improvements to models that are only possible with good AI.

reply

upvote

by handoflixue12 hours ago|

[-]

Can you name literally any other technology that had hundreds of millions of users within the first six months of being invented?

Six months after the internet was invented, you could send email between a few universities.

Six months after the computer was invented, they still hadn't actually built one.

The first transcontinental railroad, took about six YEARS just to build.

reply

upvote

by nothinkjustai9 hours ago|

[-]

GPT did not have hundreds of millions of users when it was invented almost a decade ago…

reply

upvote

by handoflixue1 hours ago|

[-]

If you want to move the goal posts, that's fine, acknowledge it: the original claim I'm responding to was "We're almost 6 months into all this AI-code madness"

If you want to set GPT as the target, that's even easier! In that decade it has passed the Turing Test, solved novel open math problems, generates audio, video, and music, and can write coherent code. Again, there is no technology that has improved more rapidly than LLMs.

reply

upvote

by leptons13 hours ago|

[-]

You say this as though AI company debt has not followed a very clear and extremely rapid ballooning in a startlingly short amount of time.

It's the "YOLO" of business strategies.

reply

upvote

by aspenmartin12 hours ago|

[-]

Always amazes me how we’re on a platform with “ycombinator” in the url and people don’t understand how private companies scale to capture market share. You’re right Uber was that company that ran at a loss for so long and collapsed, another YOLO business strategy. Or maybe it was Amazon or…hmm I forget

reply

upvote

by slopinthebag15 hours ago|

[-]

Yes but we don't know the shape of the curve and where we are on it.

reply

upvote

by aspenmartin12 hours ago|

[-]

See chinchilla scaling laws, we have the functional form of the curve and know the constants (though they change and are domain and model specific):

L(N,D) ~= 1.69 + 406 / N^0.339 + 411 / D^0.285

L is loss (pre training test loss) D is the scale of the data N is the number of model parameters

reply

upvote

by slopinthebag4 hours ago|

[-]

You need to touch grass dude, seriously.

reply

upvote

by aspenmartin1 hours ago|

[-]

Why deflect from the conversation and attempt to insult someone? What I’m saying is literally canonical and extremely well known literature.

reply