upvote
> But you can't talk to them about the flow of the code. You can't ask them for their thinking as to why certain things are.

You can absolutely do this. It's even right most of the time.

reply
Let's be real. Most of the time you ask an LLM "Why did you do it like this?", it responds with something along the lines of "Oops. My bad. You're right to point this out."

You even have a fair chance of getting a response like that when there isn't anything wrong and the question wasn't rhetorical - which perfectly illustrates the level of the genuine understanding LLMs operate at.

reply
When you criticize AI, always remember that the alternative is the average employee. Today's models are pretty good.
reply
A lot of people think they're above average. A lot of them are wrong.

A lot of average people are producing gigantic messes. At least previous to this they were gated by their mediocrity.

reply
> the alternative is the average employee. Today's models are pretty good.

I have never seen anywhere in the world people that hates so much the working class as people do in the USA.

In my country the average employee is competent, they do their work and create wealth for the nation.

Again, only in the USA people think that billionaires are the ones creating value. Total non-sense indoctrination.

reply
I'm not American or ever worked in the USA. It's not a judgement of human value. It's a judgement of work output.
reply
To adequately validate work you must be at least at the same level, so if you were right (which dunning-kruger suggests unlikely) that would mean your "terrible" average employee is given a tool that will 10x their output which they cannot even check for correctness. And correctness will be low if the average employee is bad like you say, because it means they will give badly specified tasks and even with the best of us it's garbage in, garbage out. I am sure there is no way this can backfire.
reply
All enablers also enable mediocrity. That's not new. At least when the non-mediocre engineer has to work with someone, they can have a tireless responsive partner.

I find this varies by individual, but the AI taking care of so much boilerplate and rote work of coding, and taking the role of architect, test designer, and reviewer is a lot more productive for me. Check the code may take the same skill, but it's an order of magnitude less work.

reply
Perhaps if you need that much boilerplate it's not going to be a well-architected codebase in the first place. Abstract it out, make a lib out of it. Easier to review & test in separation. Loose coupling, high cohesion.
reply
and have they totally got rid of the average employees? They can blame the models for the production outages already?
reply
I remember hearing (perhaps last year?) that the model companies have specifically tried to obfuscate the "thinking/reasoning" behind the decisions the models make so as to prevent cheaper models from training on the reasoning logs. So asking one "why did you do it like this" might be not fruitful.

Not sure if that's true or if it might be influencing what you're seeing, but it's a thought.

reply
I think that has to do more with the thinking "train of thought" that some models show as what the model is processing before making the response. There shouldn't be a distillation risk with actually asking the model to explain why it made a decision and getting the response.
reply
This has happened to me, so I put this in my global CLAUDE.md, and it seems to help (I don't remember getting the response you mentioned for awhile now):

    **Lead with the answer when asked how/which/whether.** Name the command/mechanism first; a question seeking understanding isn't a go-ahead to execute. Answer, then offer to act.
reply
That's because of a fundamental misunderstanding of what an LLM is. The only correct answer to "Why did you do it like this?" is that the specific combination of input text and RNG state caused this particular output. There's no reasoning to be had.

* EDIT * What's with the downvoting? That's a correct description of what happened. You can't ask an LLM why it did something and expect a coherent response, because there's no thinking chain, and no stored thinking state... At best, you can get a reconstruction of how the context relates to the output (basically a summarization of the context).

reply
Can't remember the last time that happened.
reply
Happened to me at least three times the past 14 days. I point out where it made a design decision that causes data loss. «Oops my mistake»
reply
I encounter it constantly with the latest models. Claude is particularly prone to it.

> I shouldn’t have said that with confidence

> I got ahead of myself there

> I overstepped, allow me to correct that

It’s wild seeing how often it’s wrong, and I only know it’s wrong because I am an SME or actually reading the sources. Most of my coworkers are not SMEs with what they are asking and do not read the sources.

A huge part of my job now is fixing fuck ups and failures resulting from these slop jockeys who have already moved on to slop up the next task.

reply
So what? That doesn’t negate the value they provide.
reply
I believe the “them” the OP was talking about was referring to the people opening the PRs, not the LLMs.
reply
My mistake, that is definitely a different scene.
reply
And you can certainly tell it the flow you want (and any other constraints) in the prompt.
reply
> But you can't talk to them about the flow of the code. You can't ask them for their thinking as to why certain things are.

There are plenty of valid criticisms or warnings about over-reliance on AI coding, but this is not one of them. Today, I am using a semi-autonomous agentic coding system which has an `interview` functionality built in - when it spits out the PR from the input, if you have questions about the motivation or context for a particular choice, you can start up a clone of the original agent in a sandbox to question it.

Now, you might claim that those responses aren't always reliable, accurate, or consistent, and that claim has a little more weight (though, in my experience, decreasingly so) - but it is _certainly_ not the case that you cannot interview an agent about choices made. I'm literally doing it every day.

reply
Sorry, I meant interviewing the PR author for certain choices.
reply
> Because companies are betting that this spending will allow them to reduce cost by firing people.

I've never worked at a company that didn't have a technical backlog measured in years.

reply
If they don't hire to get it done it means they don't think it's really important to get it done.
reply
That is an amazing point that invalidates the backlog in my mind. Stated vs revealed preferences in the end.
reply
Literally in the middle of ripping apart a vibe coded mess at work to figure out what's even worth keeping. Not fun :(
reply
use ai to do that
reply
What happens if you just keep vibe coding is? Does it whack-a-mole fix one area and break another?
reply
It's so fucking bad. I'm watching a team try to maintain a huge dashboard/control application that interfaces with a large amount of hardware using solely AI workflows.

Literally nothing works, all the timers/time counters are different across the pages, constantly commands hardware to do stupid shit, breaks during critical moments/in front of clients.

Eventually mgmt had to institute change freezes for high profile events because the team was breaking too much shit all the time.

The average C suite dipshit doesn't realize that the performance drops off a cliff once your project is more than some fraction of the context window so they will make pretty dashboards all day long but once you need to cover all the edge cases of a real system it all explodes.

AI isn't trained on the type of software style we'll need to create systems using AI, it's trained on how we used to write software. It doesn't reuse code or elegantly structure annoying, it just adds more code until the thing builds and passes some fake tests, even if half of it is functionally dead/unused.

reply