undefined

points

by jwpapi11 hours ago |

comments

by jwpapi2 hours ago|

[-]

Wow so many replies.

I think it goes down in two camps. AI is improving on these issues and people countering.

I don’t know for sure, but to me it seems the last 2 years weren’t necessarily 'intelligence' improvements but post-training improvement and tool connections, also reduced censorship.

I’m know using less AI than ever and I’ve been burning 1000USD/month before Claude Code. I have a couple of really fundamental functions built that help me to solve a big chunk of specific problems I can built a lot on that. Adding functionality became easier not more complicated.

I would think for these business problems that I’m facing AI is less than 30% of the time right. For example deciding on how to setup databases for max efficiency how to write efficient queries. Everything that in the end is really moat to you compared to your vibe coded competitors.

From my personal experience I’ve seen a lot of vibe-cded companies stuck and barely adding nec functionality or features and my guess is that they don’t trust changes anymore.

So even if AI would be as good as a really good coder one thing would still be missing a person that is knowing exactly what is happening.

And I mean okay it might be writing a form real quick. But a modern form needs to do a lot of things and if you have established patterns for all kind of inputs, the implementation is mundane.

It’s like when you learn coding, type it yourself to learn. So if you can’t scale the AI only codebase at one point you have to learn it, and I argue right now most efficient way is to write in it.

And I’m also arguing that it’s really tough to get a software so good that it’s actually an asset on the market vibe-coded only. It seems like its more of a drug for wannapreneurs than it is actually building an asset.

Like it builds you a Netflix clone, but what you see is barely the code you need to write a Netflix competitor.

by onionisafruit8 hours ago|

prev|

[-]

I know it’s not your main point, but I’m curious where $300/line comes from. I don’t think I’ve ever seen a dollar amount attached to a line of production code before.

by aspenmartin10 hours ago|

prev|

[-]

I think this sounds like a true yet short sighted take. Keep in mind these features are immature but they exist to obtain a flywheel and corner the market. I don’t know why but people seem to consistently miss two points and their implications

- performance is continuing to increase incredibly quickly, even if you rightfully don’t trust a particular evaluation. Scaling laws like chinchilla and RL scaling laws (both training and test time)

- coding is a verifiable domain

The second one is most important. Agent quality is NOT limited by human code in the training set, this code is simply used for efficiency: it gets you to a good starting point for RL.

Claiming that things will not reach superhuman performance, INCLUDING all end to end tasks: understanding a vague business objective poorly articulated, architecting a system, building it out, testing it, maintaining it, fixing bugs, adding features, refactoring, etc. is what requires the burden of proof because we literally can predict performance (albeit it has a complicated relationship with benchmarks and real world performance).

Yes definitely, error rates are too high so far for this to be totally trusted end to end but the error rates are improving consistently, and this is what explains the METR time horizon benchmark.