undefined

points

[-]

We've gone from "you're holding it wrong" to "the training data was bad because humans suck too". Difference is, humans learn from their mistakes.

by klibertp6 hours ago|

parent|

[-]

A singular human does (or tends to). Humans as a group, where members join and leave a group with time, also do learn, but at a much slower pace - over the years to decades timeframe. "X things programmers should know about Y" is a template for quite a few very influential blog posts, yet for most of them, you find many programmers, even decades later, who don't actually know what they "should".

My experience was always that 90% of code is ugly and clunky. I'm not at all surprised, while reviewing AI-generated code, to see many of the same ugliness we regularly commit. The quality of the output code is now consistently average, which means it's basically shit in 90% of cases, but it tends to mostly work (in the general case). The same kind of shit I've seen people push to production thousands of times in my career.

We don't fully know how to write good code. We don't really understand what good code should objectively look like. Spending more time on code doesn't automatically lead to better code (but costs a lot more). Above all, we don't need good code - the business side is perfectly fine with "good enough right now" rather than "maybe a lot better half a year from now". And that's what the models are trained on. They would, indeed, need quite a lot of "emergent properties" to go from that to consistently good code. ASI-level properties, I suspect.

by SilverSlash10 hours ago|

parent|

prev|

[-]

> Difference is, humans learn from their mistakes.

Great! So next time the human will prompt the agent to watch out for and avoid this bug.

by fg1371 hours ago|

parent|

[-]

Given the amount of training data out there, LLMs should have been perfect by now.

by sdesol6 hours ago|

parent|

prev|

[-]

> Great! So next time the human will prompt the agent to watch out for and avoid this bug.

I actually created a system for something like this. The basic idea is, once you have identified what the issue was and fixed it, you can create lessons that lives inside the repository. Lessons are designed to be mapped to one or more files so if the LLM changes the files again, they can see what the issue was.

The main challenge is being able to summarize and create proper tags so the AI after any code change can easily find the lesson.

by ponector10 hours ago|

parent|

prev|

[-]

You are a senior developer. Please do no mistakes!

by Zenul_Abidin2 hours ago|

prev|

[-]

I've been bitten by this bug for several days, to the point where I had had to write a script to delete the WAL so that my server would stop getting locked up from a lack of disk space from codex logging.

You can find it here: https://github.com/openai/codex/issues/28224#issuecomment-47...

I have been making noise about this bug for a week, so I'm glad to see this is blowing up on HN.

by xpct10 hours ago|

prev|

[-]

Lack of accountability is the cause here. People don't think before hitting the 'Publish' button. Their managers let them off the hook because the culture still allows making egregious mistakes, as long as there's an LLM to blame.

by applfanboysbgon11 hours ago|

prev|

[-]

1. I bet that developer only made that mistake one time in their life. Humans learn from their mistakes, LLMs don't. If you rely on LLMs to generate all of your code, you can expect to run into the same issues again and again.

2. "One developer somewhere in the world made a bad mistake one time, so this represents the quality of all software devs everywhere". Maybe they were just a bad developer? Bad developers exist. I have never written a bug that has destroyed my users' hardware, and I think that writing such a bug is completely inexcusable in an enterprise environment with software that will be shipped to millions of users, as Codex is.

by matharmin10 hours ago|

parent|

[-]

LLMs do learn from mistakes. Not as directly from individual mistakes like humans do, but in aggregate the models have improved much more in the last year than most humans I know learn in the same time.

by xpct10 hours ago|

parent|

[-]

I don't like the reframing of 'learning from mistakes' from a human-like, near instantaneous feedback loop, to a year-long process of retraining on many traces collected from user data. They're different concepts and we should refer to them using different phrasing.

by Y-bar8 hours ago|

parent|

prev|

[-]

How many more times do I have to add variations of ”do not run any commands for the application without first entering the running container at `docker compose …`” to my AGENTS.md before it learns that node and phpunit is not available outside these containers?

by lifthrasiir11 hours ago|

parent|

prev|

[-]

> I have never written a bug that has destroyed my users' hardware, ...

Probably whoever (human or agent) originally decided to put TRACE logs into SQLite also thought---or reasoned---so. Maybe the decision was right at that time but the amount of TRACE logs have increased enormously. You will never know.

by applfanboysbgon10 hours ago|

parent|

[-]

I love that we've moved the goalposts from "LLMs are better than artisanal software engineers" to "actually, shipping hardware-destroying bugs in production is literally unavoidable, nobody could possibly avoid doing it".

by lifthrasiir10 hours ago|

parent|

[-]

I only meant what I said. After all the OP's thesis was that LLMs aren't better than artisanal software engineers, are they? There was no goalpost to move at least in this particular thread. And the solution might be another agent monitoring those oft-ignored signals.

by da_grift_shift10 hours ago|

prev|

[-]

What are your thoughts on the SNR of the linked GitHub issue threads? Consider the volume of comments posted and the substance of each comment.

by fn-mote10 hours ago|

parent|

[-]

I read the first page and they were excellent. Each was clearly written by an experienced dev who knows how to substantiate their claims and propose an acceptable fix that could just be merged.

Your comment, on the other hand, would be improved by including your own opinion on the matter.

by 9 hours ago|

parent|

[-]

deleted

by 10 hours ago|

parent|

prev|

[-]

deleted

by gruez7 hours ago|

parent|

prev|

[-]

> Each was clearly written by an experienced dev

/s?

They're clearly AI generated