upvote
I follow the same process. I have a design in mind for the problem at hand, but I don't reveal it to Codex. I go back and forth a bit to see if its proposals are better than mine. I go back and forth on tradeoffs of various approaches. And then I ask it to compare its proposals with mine. I "win" most of the time but there are many times where it shows a me a better, or simpler approach, or makes me rethink the solution altogether.

Once this is done, the mechanical coding parts are mostly routine (for codex)

reply
I really like this pattern and use it often, this 'not showing my cards'. The second I hint towards the LLM what I prefer it will become sycophantic and invent nonsense why my preferred solution is better.

I'm sure there's an interesting study on how users 'leak' their preference unintentionally to the LLM; perhaps when users list their options, they often put their prefered option first; but not showing the cards on my hand has been very useful when thinking through a problem with LLMs.

reply
LLMs flip positions when users push back ~70% of the time even when they were right. RLHF optimizes for approval, not correctness
reply
> LLMs flip positions when users push back

Same experience. Claude rarely pushes back once you give a plausible/logical reason for your initial decision, even if it flagged concerns at first.

reply
I have noticed this as well, but I think it's somewhat a good thing. I know what I want for my application more than Claude does for example, especially when it comes to what's in production.

An example from earlier, Claude strongly suggested a migration that would run a full vacuum on postgres. However, in production this would lock tables which would grind the application to a halt. After I informed Claude that there were millions of rows in production, it accepted that and helped me get to the right thing.

Another example, I'm developing a TOTP authentication app because I'm dissatisfied with all those that I've tried. I want something strictly local, and with a very easy use case when you have dozens or even a hundred or more accounts on there, that is also efficient when left open for long periods of time. Claude strongly suggested that we force users to encrypt their vault with a passphrase all the time. However this makes the CLI extremely painful to use if you are using a strong passphrase. I told Claude about the user experience impacts and that I wanted to allow users to optionally use a vault with no passphrase encryption, and it accepted that and suggested as a medium that we have a checkbox for the user to explicitly acknowledge that they're creating an unencrypted vault on disc. This is the right thing IMHO.

reply
Skills help there.

I have a linus-reviewer skill that focuses on architectural integrity, no bs, etc modeled on Torvald's code preferences.

And I have an enrico-reviewer one (I'm Enrico), that focuses on correct design, strict typing, simplification.

They have different prios, but they both push back on feedback, till you convince them.

reply
I almost always end with something like: “, but I am not sure, evaluate.” Or other things and avoid ever stating a preference.
reply
I don't think that "fixes" the problem, but it does seem to help. I also have found adding "please feel free to ask questions" seems to help it stop from making an assumption and spinning merrily onward for tens of thousands of tokens based on a bad idea rather than asking you something. I theorize this is because the training and refinement data overprioritize one-shot solutions, both because that's easier to evaluate at training time and improves their benchmarks. But I emphasize the italicized words because that's all gut feel and I can't prove any of it.
reply
Interesting thing about psychponancy is it’s asymmetric. If an LLM is used to train an LLM it may not have the same level of aggressiveness that humans do when punishing back on trainee. Human pushback has specific patterns which we might be able to compensate due to asymmetry.
reply
Obviously this is just my experience. Claude code pushes back much harder than Codex.
reply
Tangentially related but I’ve been using Claude to practice interviewing on system design problems, and it’s actually pretty great. But even when it likes my answers it always finds something, however small, to push on. Once it actually was completely wrong and admitted it after I had it realize. So maybe you have to prime it to be contrary and not agree with everything you say, putting it in the role of a tough interviewer seems to do this implicitly.
reply
Take a look at hellointerview.com their model is very stubborn, similar to some interviewers who refuse to acknowledge even valid solutions that differ from the canon.

No affiliation.

reply
Same. Alternatively (or in addition), I sometimes present my preferred idea as being a "bad/naive/stupid option" (or a suggestion from someone who can't be trusted) to see how it stands up to sycophancy to it being bad. As expected the LLM will usually say "yeah it's bad!" and give plausible-sounding reasons for it, but if these reasons are nonsensical it's a good sign that I'm not missing anything
reply
LLMs are very prone to priming in my experience. That is the human psychology name for what you are describing; whether it should be applied to LLMs I don't know, but it describes the phenomenon perfectly.
reply
It's not limited to arguing with LLMs but if you want a honest opinion you should remember to push back even when it agrees with your hidden preference at first. Sometimes it is only being contrarian or supporting the underdog. Steelman the opposition.
reply
> I go back and forth a bit to see if its proposals are better than mine

I find it useful to let it generate benchmarks comparing the approaches. Turns out AI is terrible at guessing whats faster or allocates less

reply
Yup, just like people!
reply
> Turns out AI is terrible at guessing whats faster or allocates less

s/AI/a human being/ would work equally well, lol.

Jokes aside, I do like the approach of letting the AI build something deterministic and make decisions based on that.

reply
I think this approach is more common than the hype for actual work. I do something similar, many back and forth, then settle on something often with now known tradeoffs, written by hand to spot issues as a final guard/ keep consistent naming etc.
reply
i bet you've contributed a lot of training trajectories for those AI's.
reply
Good!
reply
[flagged]
reply
Despite the cynical sibling reply, I also feel like there's real value here. Contrary to the meme, I don't think Claude just tells me I'm brilliant, but really does push back on directions that are unproductive, helps identify when a part is overcomplicated or a dependency has become redundant, etc. Those are important things to have at least a sightline on before getting too deep into the code, even (or maybe especially) in a world where an awful lot of code can be created basically for free.
reply
I'm usually the one spotting redundancies and dead branches in Claude's code, not the other way around. But I think either way, what's important is questioning the process and understanding the way the code is working so that you retain a full mental model.
reply
>> and still largely understand the code [...] ,that, I feel has made me a better engineer

the cynic in me would say that a good engineer should fully understand the code you write.

I'm not suggesting that AI is the problem here - you could vibe code with the AI have have it explain the reasoning and patterns - or else tell it to use 'simpler' patterns from the outset. For any one problem in software engineering, there are always multiple solutions; some slower, some faster, some more flexible etc. The code you produce should, imo, but at the level that you can understand it.

How can you reason about code you don't fully understand? How can you judge the future impact (technical debt and the cost of maintenance) of your projects?

A.I makes it easier to get yourself into problems early on.

reply
> How can you reason about code you don't fully understand?

We all do, though. It takes months for a human to really get to know a project and, unless you’re working at a small startup, you’ll probably never know most of the code outside the corner you work in.

reply
Yes, this is why bugs get often worked around instead of being fixed properly.
reply
One strategy I use in the planning phase is even when I know how I'd implement the solution, I ask the Claude/Codex how they would solve the problem or implement the feature without giving them any clues - and then compare their solutions to my own. Often I am pleasantly surprised by alternative ways of doing things and ideas that we integrate into the final design.
reply
Same. I've been creating "research" documents where I let it do a freeform survey of possible solutions/have sketch out it's own solution. I'll then sketch out a plan based on what I think is good or what I think it missed, and then I'll have it interrogate me for a final PRD document. It then implements the feature in reviewable chunks, and I'll give it feedback or tweak the PRD doc as needed.

Finally feel like I have a good workflow where I can fully benefit from these things without sacrificing my understanding of what they're doing.

reply
Same here. Step 1 is usually a research doc where I simply describe the task and tell it to research the relevant parts of the codebase. This gets refined to a high-level plan, which gets distilled to a detailed step-by-step implementation plan.

When it comes to the actual implementation I prefer to work through it in small steps, where the AI explains to me exactly what it's about to do and why (and I approve) along the way. This enables me to catch it if it's about to do something I disagree with beforehand. And reduces the time I need to spend reviewing in the end.

reply
I like this, though it does leave me feeling more nervous when I really don't know how I'd solve the problem, still requires trust.
reply
How would you approach this problem if you are let's say token constrained due to per month limits set in your company?

What I've tried to do is make the bot write detailed spec documents, slowly building it over time as I explain the full problem.

It works for the most part but it's you have some non standard requirement, the agent seems to skip over that part of the spec document when it starts to code. Or it would have needless checks for situations that I said will never happen

reply
In my book, the single most effective way to spend tokens is having it review code/specs you've written. One advantage to putting the ai in that position is that unreliable competence isn't much of a problem as you can ignore bad suggestions.

I would also recommend explaining the specs and doing a lot of your back and forth with a lower end model and set it to a higher end model only once the conversation history has all the context you feel the higher end model needs.

reply
As the post says, after an agent implements the plan, have another agent review it. Make sure to mention it must ensure the plan is fully executed. It works wonders!
reply
[flagged]
reply
I also like doing this exact thing. I really don't like using any AI-powered IDEs but AI is still too useful, what I do is just open up a Claude or Gemini chat, explain the project, and start talking about implementations, feature additions, and how systems should be structured. Most of the time, as long as you dont let the AI be too biased towards your answers, it'll give actually good answers that help immensely for the project.
reply
>I argue about design and architecture all day with a robot.

You will outgrow it at some point.

reply
reply
Yes, this is the way I do stuff.

Try and learn at every point.

reply
I think this is OK though. We can still micromanage[0] the code generation part for a useful productivity boost, I think.

[0] At least, in my experience, "micromanaging" the AI is what gives me the best results. Iterating on the initial design, then iterating on the plan, then reviewing the proposed code changes (including tests), then getting an independent code review from another LLM, etc. If you give an LLM too much latitude that's when the really shitty code and ill-considered breaking changes/obliteration of existing functionality starts to creep in.

reply
I feel like there's an overly negative vibe to this response when it just seems like rubber duck debugging - I would assume the user isn't trying to argue like how you might have to argue specs, but is merely trying to clarify their own ideas and learn possible alternatives.
reply
Quite the opposite. It’ll most likely “outgrow” us.
reply
Can't, it ain't nothing BUT us.

You can wait and see, but that's what'll happen. If we stop it stops.

reply
nullsanity's comment is dead and downvoted to oblivion but also incredibly underrated.

I was more annoyed than anything that I didn't hit this moment until my 40s.

Except it's not just reddit (I quit reddit 15 years ago). It's the whole internet.

reply
What you guys don't understand is that you don't argue with people or robots to teach them. You argue to teach yourself. Until you get out of that mindset, indeed a lot of conversation will seem useless, be it people or robots.
reply
>You argue to teach yourself.

Oh. I am aware. It is not that deep. But who you argues with still matter. There was a point where I have abandoned Reddit and HN. I came back to HN because people here also seem to have grown up. Reddit stays mostly the same.

I credit the moderation here for that, I mean allowing people to grow out of the echo chamber.

reply
It does to an extent. One thing I will give AI, because of the nature of LLMs, you are essentially arguing with the median level of the input that trained the model. So, for someone new to the subject, you get access to patterns that will bring them up to a certain level.

Getting past that is problem we face now.

reply
That may well need more than the models, somehow put it better than me: these LLMs have no taste - nor can they as thins are.
reply
>nullsanity's comment is dead and downvoted to oblivion but also incredibly underrated.

Yes, I thought the same as well because that was the same line of thought that made me write my comment.

>Except it's not just reddit (I quit reddit 15 years ago). It's the whole internet.

Yea, they are like a slingshot. You need to let go at some point or else it will drag you back.

reply
Its like that phase people go through where they argue with morons on reddit, and then one day grow up and realize that most of these people are unemployed/underemployed terminally online nobodies aren't ever going to learn anything, and even if they did it wouldn't impact the world since they were just some below average hobbyist anyway and aren't in charge of anything more important than a box of paperclips.
reply
Ah, if it’s a robot in charge of the paperclips you need to watch out a bit.
reply
Mostly with you, though in recent years I have wondered whether those people are part of what caused the latest boom of political populism. If there is no one there to debate the problematic ideas, problematic ideas will become the rhetoric after all.
reply
That might be true on general-population social media, but the opposite is the case in niche groups, and in particular, this very industry we're in - software - was largely built on terminally online hobbyists.
reply
[dead]
reply
I think that many AIs nowadays have similar process incorporated in their thinking blocks, you can see there how it discuss implementation details with itself - so such discussion happen even in case human does not participate in the loop.
reply
Yeah, me too. I argue with multiple models at the same time via a markdown doc to coordinate the discussion. I feel like it makes me less anxious about the final output if nothing else.
reply
Yeah I feel like a rubber ducking with some feedback has been very helpful
reply
I agree with this take. But this take also means that actual productive token use is not as high as people currently make it out to be.

AI is an excellent rubber duck and test writer. Maybe I sniff my farts too much but I like my code just the way I want it lol

reply
This.

This is what I tell people (including non-programmers interested in vibe coding), the results you get are product of... process. Formal process.

From this naturally emerges the other thing I tell people: domain expertise (or at least, familiarity and or capacity for learning) is still determinate of outcome.

I don't touch the code. But I do push back on expedience, laziness, inconsistency, and all the other recurring unsolved problems of generated code... and continue to play whack-a-mole in pursuit of process that whacks the moles.

reply
The professionalization of rubber ducking. I like it.
reply
Yet, so many internet users seem to only understand "hand crafted" vs "vibe coded" as if there wasn't tons of middle grounds and different uses.
reply
I think this is honestly the #1 best use case for AI in development. If you use it right it can be exactly the annoying junior who questions every decision you make that you need.
reply