undefined

points

[-]

I follow the same process. I have a design in mind for the problem at hand, but I don't reveal it to Codex. I go back and forth a bit to see if its proposals are better than mine. I go back and forth on tradeoffs of various approaches. And then I ask it to compare its proposals with mine. I "win" most of the time but there are many times where it shows a me a better, or simpler approach, or makes me rethink the solution altogether.

Once this is done, the mechanical coding parts are mostly routine (for codex)

by a_bonobo11 hours ago|

parent|

[-]

I really like this pattern and use it often, this 'not showing my cards'. The second I hint towards the LLM what I prefer it will become sycophantic and invent nonsense why my preferred solution is better.

I'm sure there's an interesting study on how users 'leak' their preference unintentionally to the LLM; perhaps when users list their options, they often put their prefered option first; but not showing the cards on my hand has been very useful when thinking through a problem with LLMs.

by cold_harbor7 hours ago|

parent|

[-]

LLMs flip positions when users push back ~70% of the time even when they were right. RLHF optimizes for approval, not correctness

by 8cvor6j844qw_d66 hours ago|

parent|

[-]

> LLMs flip positions when users push back

Same experience. Claude rarely pushes back once you give a plausible/logical reason for your initial decision, even if it flagged concerns at first.

by freedomben6 hours ago|

parent|

[-]

I have noticed this as well, but I think it's somewhat a good thing. I know what I want for my application more than Claude does for example, especially when it comes to what's in production.

An example from earlier, Claude strongly suggested a migration that would run a full vacuum on postgres. However, in production this would lock tables which would grind the application to a halt. After I informed Claude that there were millions of rows in production, it accepted that and helped me get to the right thing.

Another example, I'm developing a TOTP authentication app because I'm dissatisfied with all those that I've tried. I want something strictly local, and with a very easy use case when you have dozens or even a hundred or more accounts on there, that is also efficient when left open for long periods of time. Claude strongly suggested that we force users to encrypt their vault with a passphrase all the time. However this makes the CLI extremely painful to use if you are using a strong passphrase. I told Claude about the user experience impacts and that I wanted to allow users to optionally use a vault with no passphrase encryption, and it accepted that and suggested as a medium that we have a checkbox for the user to explicitly acknowledge that they're creating an unencrypted vault on disc. This is the right thing IMHO.

by epolanski5 hours ago|

parent|

prev|

[-]

Skills help there.

I have a linus-reviewer skill that focuses on architectural integrity, no bs, etc modeled on Torvald's code preferences.

And I have an enrico-reviewer one (I'm Enrico), that focuses on correct design, strict typing, simplification.

They have different prios, but they both push back on feedback, till you convince them.

by bitexploder6 hours ago|

parent|

prev|

[-]

I almost always end with something like: “, but I am not sure, evaluate.” Or other things and avoid ever stating a preference.

by jerf6 hours ago|

parent|

[-]

I don't think that "fixes" the problem, but it does seem to help. I also have found adding "please feel free to ask questions" seems to help it stop from making an assumption and spinning merrily onward for tens of thousands of tokens based on a bad idea rather than asking you something. I theorize this is because the training and refinement data overprioritize one-shot solutions, both because that's easier to evaluate at training time and improves their benchmarks. But I emphasize the italicized words because that's all gut feel and I can't prove any of it.

by DenisM3 hours ago|

parent|

prev|

[-]

Interesting thing about psychponancy is it’s asymmetric. If an LLM is used to train an LLM it may not have the same level of aggressiveness that humans do when punishing back on trainee. Human pushback has specific patterns which we might be able to compensate due to asymmetry.

by throwaway77832 hours ago|

parent|

prev|

[-]

Obviously this is just my experience. Claude code pushes back much harder than Codex.

by cdelsolar7 hours ago|

parent|

prev|

[-]

Tangentially related but I’ve been using Claude to practice interviewing on system design problems, and it’s actually pretty great. But even when it likes my answers it always finds something, however small, to push on. Once it actually was completely wrong and admitted it after I had it realize. So maybe you have to prime it to be contrary and not agree with everything you say, putting it in the role of a tough interviewer seems to do this implicitly.

by DenisM3 hours ago|

parent|

[-]

Take a look at hellointerview.com their model is very stubborn, similar to some interviewers who refuse to acknowledge even valid solutions that differ from the canon.

No affiliation.

by williamdclt9 hours ago|

parent|

prev|

[-]

Same. Alternatively (or in addition), I sometimes present my preferred idea as being a "bad/naive/stupid option" (or a suggestion from someone who can't be trusted) to see how it stands up to sycophancy to it being bad. As expected the LLM will usually say "yeah it's bad!" and give plausible-sounding reasons for it, but if these reasons are nonsensical it's a good sign that I'm not missing anything

by nickcw11 hours ago|

parent|

prev|

[-]

LLMs are very prone to priming in my experience. That is the human psychology name for what you are describing; whether it should be applied to LLMs I don't know, but it describes the phenomenon perfectly.

by avadodin9 hours ago|

parent|

prev|

[-]

It's not limited to arguing with LLMs but if you want a honest opinion you should remember to push back even when it agrees with your hidden preference at first. Sometimes it is only being contrarian or supporting the underdog. Steelman the opposition.

by yread12 hours ago|

parent|

prev|

[-]

> I go back and forth a bit to see if its proposals are better than mine

I find it useful to let it generate benchmarks comparing the approaches. Turns out AI is terrible at guessing whats faster or allocates less

by chris_st8 hours ago|

parent|

[-]

Yup, just like people!

by puilp05028 hours ago|

parent|

prev|

[-]

> Turns out AI is terrible at guessing whats faster or allocates less

s/AI/a human being/ would work equally well, lol.

Jokes aside, I do like the approach of letting the AI build something deterministic and make decisions based on that.

by hackermanai13 hours ago|

parent|

prev|

[-]

I think this approach is more common than the hype for actual work. I do something similar, many back and forth, then settle on something often with now known tradeoffs, written by hand to spot issues as a final guard/ keep consistent naming etc.

by revv0010 hours ago|

parent|

prev|

[-]

i bet you've contributed a lot of training trajectories for those AI's.

by chris_st8 hours ago|

parent|

[-]

Good!

by daniel33038 hours ago|

parent|

prev|

[-]

[flagged]

by mikepurvis16 hours ago|

prev|

[-]

Despite the cynical sibling reply, I also feel like there's real value here. Contrary to the meme, I don't think Claude just tells me I'm brilliant, but really does push back on directions that are unproductive, helps identify when a part is overcomplicated or a dependency has become redundant, etc. Those are important things to have at least a sightline on before getting too deep into the code, even (or maybe especially) in a world where an awful lot of code can be created basically for free.

by noduerme14 hours ago|

parent|

[-]

I'm usually the one spotting redundancies and dead branches in Claude's code, not the other way around. But I think either way, what's important is questioning the process and understanding the way the code is working so that you retain a full mental model.

by lintfordpickle10 hours ago|

prev|

[-]

>> and still largely understand the code [...] ,that, I feel has made me a better engineer

the cynic in me would say that a good engineer should fully understand the code you write.

I'm not suggesting that AI is the problem here - you could vibe code with the AI have have it explain the reasoning and patterns - or else tell it to use 'simpler' patterns from the outset. For any one problem in software engineering, there are always multiple solutions; some slower, some faster, some more flexible etc. The code you produce should, imo, but at the level that you can understand it.

How can you reason about code you don't fully understand? How can you judge the future impact (technical debt and the cost of maintenance) of your projects?

A.I makes it easier to get yourself into problems early on.

by jnovek8 hours ago|

parent|

[-]

> How can you reason about code you don't fully understand?

We all do, though. It takes months for a human to really get to know a project and, unless you’re working at a small startup, you’ll probably never know most of the code outside the corner you work in.

by silon426 hours ago|

parent|

[-]

Yes, this is why bugs get often worked around instead of being fixed properly.

by bottlepalm15 hours ago|

prev|

[-]

One strategy I use in the planning phase is even when I know how I'd implement the solution, I ask the Claude/Codex how they would solve the problem or implement the feature without giving them any clues - and then compare their solutions to my own. Often I am pleasantly surprised by alternative ways of doing things and ideas that we integrate into the final design.

by didericis14 hours ago|

parent|

[-]

Same. I've been creating "research" documents where I let it do a freeform survey of possible solutions/have sketch out it's own solution. I'll then sketch out a plan based on what I think is good or what I think it missed, and then I'll have it interrogate me for a final PRD document. It then implements the feature in reviewable chunks, and I'll give it feedback or tweak the PRD doc as needed.

Finally feel like I have a good workflow where I can fully benefit from these things without sacrificing my understanding of what they're doing.

by codebolt12 hours ago|

parent|

[-]

Same here. Step 1 is usually a research doc where I simply describe the task and tell it to research the relevant parts of the codebase. This gets refined to a high-level plan, which gets distilled to a detailed step-by-step implementation plan.

When it comes to the actual implementation I prefer to work through it in small steps, where the AI explains to me exactly what it's about to do and why (and I approve) along the way. This enables me to catch it if it's about to do something I disagree with beforehand. And reduces the time I need to spend reviewing in the end.

by ddp265 hours ago|

parent|

prev|

[-]

I like this, though it does leave me feeling more nervous when I really don't know how I'd solve the problem, still requires trust.

by rdedev14 hours ago|

prev|

[-]

How would you approach this problem if you are let's say token constrained due to per month limits set in your company?

What I've tried to do is make the bot write detailed spec documents, slowly building it over time as I explain the full problem.

It works for the most part but it's you have some non standard requirement, the agent seems to skip over that part of the spec document when it starts to code. Or it would have needless checks for situations that I said will never happen

by anywhichway13 hours ago|

parent|

[-]

In my book, the single most effective way to spend tokens is having it review code/specs you've written. One advantage to putting the ai in that position is that unreliable competence isn't much of a problem as you can ignore bad suggestions.

I would also recommend explaining the specs and doing a lot of your back and forth with a lower end model and set it to a higher end model only once the conversation history has all the context you feel the higher end model needs.

by brabel12 hours ago|

parent|

prev|

[-]

As the post says, after an agent implements the plan, have another agent review it. Make sure to mention it must ensure the plan is fully executed. It works wonders!

by anon700012 hours ago|

parent|

[-]

[flagged]

by jylefv10 hours ago|

prev|

[-]

I also like doing this exact thing. I really don't like using any AI-powered IDEs but AI is still too useful, what I do is just open up a Claude or Gemini chat, explain the project, and start talking about implementations, feature additions, and how systems should be structured. Most of the time, as long as you dont let the AI be too biased towards your answers, it'll give actually good answers that help immensely for the project.

by qsera16 hours ago|

prev|

[-]

>I argue about design and architecture all day with a robot.

You will outgrow it at some point.

by Terretta16 hours ago|

parent|

[-]

Or learn something at some point.

https://en.wikipedia.org/wiki/Rubber_duck_debugging

by stuaxo7 hours ago|

parent|

[-]

Yes, this is the way I do stuff.

Try and learn at every point.

by bartread16 hours ago|

parent|

prev|

[-]

I think this is OK though. We can still micromanage[0] the code generation part for a useful productivity boost, I think.

[0] At least, in my experience, "micromanaging" the AI is what gives me the best results. Iterating on the initial design, then iterating on the plan, then reviewing the proposed code changes (including tests), then getting an independent code review from another LLM, etc. If you give an LLM too much latitude that's when the really shitty code and ill-considered breaking changes/obliteration of existing functionality starts to creep in.

by rf_physics9 hours ago|

parent|

prev|

[-]

I feel like there's an overly negative vibe to this response when it just seems like rubber duck debugging - I would assume the user isn't trying to argue like how you might have to argue specs, but is merely trying to clarify their own ideas and learn possible alternatives.

by estetlinus13 hours ago|

parent|

prev|

[-]

Quite the opposite. It’ll most likely “outgrow” us.

by Applejinx8 hours ago|

parent|

[-]

Can't, it ain't nothing BUT us.

You can wait and see, but that's what'll happen. If we stop it stops.

by busterarm16 hours ago|

parent|

prev|

[-]

nullsanity's comment is dead and downvoted to oblivion but also incredibly underrated.

I was more annoyed than anything that I didn't hit this moment until my 40s.

Except it's not just reddit (I quit reddit 15 years ago). It's the whole internet.

by vasco15 hours ago|

parent|

[-]

What you guys don't understand is that you don't argue with people or robots to teach them. You argue to teach yourself. Until you get out of that mindset, indeed a lot of conversation will seem useless, be it people or robots.

by qsera14 hours ago|

parent|

[-]

>You argue to teach yourself.

Oh. I am aware. It is not that deep. But who you argues with still matter. There was a point where I have abandoned Reddit and HN. I came back to HN because people here also seem to have grown up. Reddit stays mostly the same.

I credit the moderation here for that, I mean allowing people to grow out of the echo chamber.

by BillStrong14 hours ago|

parent|

[-]

It does to an extent. One thing I will give AI, because of the nature of LLMs, you are essentially arguing with the median level of the input that trained the model. So, for someone new to the subject, you get access to patterns that will bring them up to a certain level.

Getting past that is problem we face now.

by stuaxo7 hours ago|

parent|

[-]

That may well need more than the models, somehow put it better than me: these LLMs have no taste - nor can they as thins are.

by redsocksfan452 hours ago|

parent|

prev|

[-]

[dead]

by qsera15 hours ago|

parent|

prev|

[-]

>nullsanity's comment is dead and downvoted to oblivion but also incredibly underrated.

Yes, I thought the same as well because that was the same line of thought that made me write my comment.

>Except it's not just reddit (I quit reddit 15 years ago). It's the whole internet.

Yea, they are like a slingshot. You need to let go at some point or else it will drag you back.

by nullsanity16 hours ago|

parent|

prev|

[-]

Its like that phase people go through where they argue with morons on reddit, and then one day grow up and realize that most of these people are unemployed/underemployed terminally online nobodies aren't ever going to learn anything, and even if they did it wouldn't impact the world since they were just some below average hobbyist anyway and aren't in charge of anything more important than a box of paperclips.

by dash214 hours ago|

parent|

[-]

Ah, if it’s a robot in charge of the paperclips you need to watch out a bit.

by theK12 hours ago|

parent|

prev|

[-]

Mostly with you, though in recent years I have wondered whether those people are part of what caused the latest boom of political populism. If there is no one there to debate the problematic ideas, problematic ideas will become the rhetoric after all.

by TeMPOraL12 hours ago|

parent|

prev|

[-]

That might be true on general-population social media, but the opposite is the case in niche groups, and in particular, this very industry we're in - software - was largely built on terminally online hobbyists.

by nullsanity2 hours ago|

parent|

[-]

[dead]

by redsocksfan452 hours ago|

parent|

prev|

[-]

[dead]

by jiri10 hours ago|

prev|

[-]

I think that many AIs nowadays have similar process incorporated in their thinking blocks, you can see there how it discuss implementation details with itself - so such discussion happen even in case human does not participate in the loop.

by nihsett10 hours ago|

prev|

[-]

Yeah, me too. I argue with multiple models at the same time via a markdown doc to coordinate the discussion. I feel like it makes me less anxious about the final output if nothing else.

by pcoyne5 hours ago|

prev|

[-]

Yeah I feel like a rubber ducking with some feedback has been very helpful

by vatsachak15 hours ago|

prev|

[-]

I agree with this take. But this take also means that actual productive token use is not as high as people currently make it out to be.

AI is an excellent rubber duck and test writer. Maybe I sniff my farts too much but I like my code just the way I want it lol

by aaroninsf2 hours ago|

prev|

[-]

This.

This is what I tell people (including non-programmers interested in vibe coding), the results you get are product of... process. Formal process.

From this naturally emerges the other thing I tell people: domain expertise (or at least, familiarity and or capacity for learning) is still determinate of outcome.

I don't touch the code. But I do push back on expedience, laziness, inconsistency, and all the other recurring unsolved problems of generated code... and continue to play whack-a-mole in pursuit of process that whacks the moles.

by pj_mukh9 hours ago|

prev|

[-]

The professionalization of rubber ducking. I like it.

by epolanski5 hours ago|

prev|

[-]

Yet, so many internet users seem to only understand "hand crafted" vs "vibe coded" as if there wasn't tons of middle grounds and different uses.

by deaton6 hours ago|

prev|

[-]

I think this is honestly the #1 best use case for AI in development. If you use it right it can be exactly the annoying junior who questions every decision you make that you need.