upvote
Yeah, I don't really accept the argument that AI makes mistakes and therefore cannot be trusted to write production code (in general, at least - obviously depends on the types of mistakes, which code, etc.).

The reality is we have built complex organizational structures around the fact that humans also make mistakes, and there's no real reason you can't use the same structures for AI. You have someone write the code, then someone does code review, then someone QAs it.

Even after it goes out to production, you have a customer support team and a process for them to file bug tickets. You have customer success managers to smooth over the relationships with things go wrong. In really bad cases, you've got the CEO getting on a plane to go take the important customer out for drinks.

I've worked at startups that made a conscious decision to choose speed of development over quality. Whether or not it was the right decision is arguable, but the reality is they did so knowing that meant customers would encounter bugs. A couple of those startups are valuable at multiple billions of dollars now. Bugs just aren't the end of the world (again, most cases - I worked on B2B SaaS, not medical devices or what have you).

reply
> humans also make mistakes

This is broadly true, but not comparable when you get into any detail. The mistakes current frontier models make are more frequent, more confident, less predictable, and much less consistent than mistakes from any human I'd work with.

IME, all of the QA measures you mention are more difficult and less reliable than understanding things properly and writing correct code from the beginning. For critical production systems, mediocre code has significant negative value to me compared to a fresh start.

There are plenty of net-positive uses for AI. Throwaway prototyping, certain boilerplate migration tasks, or anything that you can easily add automated deterministic checks for that fully covers all of the behavior you care about. Most production systems are complicated enough that those QA techniques are insufficient to determine the code has the properties you need.

reply
> The mistakes current frontier models make are more frequent, more confident, less predictable, and much less consistent than mistakes from any human I'd work with.

my experience literal 180 degrees from this statement. and you don’t normally get the choose humans you work with, some you may be involved in the interview process but that doesn’t tell you much. I have seen so much human-written code in my career that, in the right hands, I’ll take (especially latest frontier) LLM written code over average human code any day of the week and twice on Sunday

reply
Humans also make mistakes, but unlike LLMs, they are capable of learning from their mistake and will not repeat it once they have learned. That, not the capacity to make mistakes, is why you should not allow LLMs to do things.
reply
Developers repeat the same mistakes all the time. Otherwise off by one wouldn’t be a thing.
reply
most human bugs are caused by failures in reasoning though, not by just making something up to leap to the conclusion considered most probable, so not sure if the comparison makes sense.
reply
> most human bugs are caused by failures in reasoning though

Citation needed.

reply
sorry, that is just taken from my experience, and perhaps I am considering reasoning to be a broader category than others might.

To be lenient I will separate out bugs caused by insufficient knowledge as not being failures in reasoning, do you have forms of bugs that you think are more common and are not arguably failures in reasoning that should be considered?

on edit: insufficient knowledge that I might not expect a competent developer to have is not a failure in reasoning, but a bug caused by insufficient knowledge that I would expect a competent developer in the problem space to have is a failure in reasoning, in my opinion on things.

reply