upvote
You don't think these errors compound? Generated code has 100's of little decisions. Yes, it "usually" works.
reply
LLM’s: sometimes wrong but never in doubt.
reply
Not in my experience. With a proper TDD framework it does better than most programmers at a company who anecdotally have a bug every 2-3 tasks.
reply
The kind of mistakes it makes are usually strange and inhuman though. Like getting hard parts correct while also getting something fundamental about the same problem wrong. And not in the “easy to miss or type wrong” way.

I wish I had an example for you saved, but happens to me pretty frequently. Not only that but it also usually does testing incorrectly at a fundamental level, or builds tests around incorrect assumptions.

reply
Yes, just use random results. You’ve just saved yourself weeks or months of work of gathering actual results.
reply