undefined

points

[-]

> But of course writing code directly will always maintain the benefit of specificity. If you want to write instructions to a computer that are completely unambiguous, code will always be more useful than English.

Unless the defect rate for humans is greater than LLMs at some point. A lot of claims are being made about hallucinations that seem to ignore that all software is extremely buggy. I can't use my phone without encountering a few bugs every day.

by idopmstuff3 hours ago|

parent|

[-]

Yeah, I don't really accept the argument that AI makes mistakes and therefore cannot be trusted to write production code (in general, at least - obviously depends on the types of mistakes, which code, etc.).

The reality is we have built complex organizational structures around the fact that humans also make mistakes, and there's no real reason you can't use the same structures for AI. You have someone write the code, then someone does code review, then someone QAs it.

Even after it goes out to production, you have a customer support team and a process for them to file bug tickets. You have customer success managers to smooth over the relationships with things go wrong. In really bad cases, you've got the CEO getting on a plane to go take the important customer out for drinks.

I've worked at startups that made a conscious decision to choose speed of development over quality. Whether or not it was the right decision is arguable, but the reality is they did so knowing that meant customers would encounter bugs. A couple of those startups are valuable at multiple billions of dollars now. Bugs just aren't the end of the world (again, most cases - I worked on B2B SaaS, not medical devices or what have you).

by Fishkins3 hours ago|

parent|

[-]

> humans also make mistakes

This is broadly true, but not comparable when you get into any detail. The mistakes current frontier models make are more frequent, more confident, less predictable, and much less consistent than mistakes from any human I'd work with.

IME, all of the QA measures you mention are more difficult and less reliable than understanding things properly and writing correct code from the beginning. For critical production systems, mediocre code has significant negative value to me compared to a fresh start.

There are plenty of net-positive uses for AI. Throwaway prototyping, certain boilerplate migration tasks, or anything that you can easily add automated deterministic checks for that fully covers all of the behavior you care about. Most production systems are complicated enough that those QA techniques are insufficient to determine the code has the properties you need.

by bdangubic3 hours ago|

parent|

[-]

> The mistakes current frontier models make are more frequent, more confident, less predictable, and much less consistent than mistakes from any human I'd work with.

my experience literal 180 degrees from this statement. and you don’t normally get the choose humans you work with, some you may be involved in the interview process but that doesn’t tell you much. I have seen so much human-written code in my career that, in the right hands, I’ll take (especially latest frontier) LLM written code over average human code any day of the week and twice on Sunday

by bigstrat20031 hours ago|

parent|

prev|

[-]

Humans also make mistakes, but unlike LLMs, they are capable of learning from their mistake and will not repeat it once they have learned. That, not the capacity to make mistakes, is why you should not allow LLMs to do things.

by cactusplant73745 minutes ago|

parent|

[-]

Developers repeat the same mistakes all the time. Otherwise off by one wouldn’t be a thing.

by bryanrasmussen3 hours ago|

parent|

prev|

[-]

most human bugs are caused by failures in reasoning though, not by just making something up to leap to the conclusion considered most probable, so not sure if the comparison makes sense.

by wiseowise3 hours ago|

parent|

[-]

> most human bugs are caused by failures in reasoning though

Citation needed.

by bryanrasmussen3 hours ago|

parent|

[-]

sorry, that is just taken from my experience, and perhaps I am considering reasoning to be a broader category than others might.

To be lenient I will separate out bugs caused by insufficient knowledge as not being failures in reasoning, do you have forms of bugs that you think are more common and are not arguably failures in reasoning that should be considered?

on edit: insufficient knowledge that I might not expect a competent developer to have is not a failure in reasoning, but a bug caused by insufficient knowledge that I would expect a competent developer in the problem space to have is a failure in reasoning, in my opinion on things.